 bondscell_results #$d842ad86-d294-11ef-3266-253f80ecf4b7queued¤logsrunning¦outputbody?<div class="markdown"><h2 id="Discrete-Data:-the-1-of-K-Coding-Scheme">Discrete Data: the 1-of-K Coding Scheme</h2>
<p>Consider a coin-tossing experiment with outcomes <span class="tex">$x \in\&#123;0,1\&#125;$</span> &#40;tail and head, respectively&#41; and let <span class="tex">$0\leq \mu \leq 1$</span> represent the probability of heads. The data generating distribution for this model can written as a <a href="https://en.wikipedia.org/wiki/Bernoulli_distribution"><strong>Bernoulli distribution</strong></a>:</p>
<p class="tex">$$ 
p&#40;x|\mu&#41; &#61; \mu^&#123;x&#125;&#40;1-\mu&#41;^&#123;1-x&#125;$$</p>
<p>Note that the variable <span class="tex">$x$</span> acts as a &#40;binary&#41; <strong>selector</strong> for the tail or head probabilities. Think of this as an &#39;if&#39;-statement in programming.</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*3persist_js_state·has_pluto_hook_features§cell_id$d842ad86-d294-11ef-3266-253f80ecf4b7depends_on_disabled_cells§runtime 
嘵published_object_keysdepends_on_skipped_cells§errored$d843b33c-d294-11ef-195d-2708fbfba49dqueued¤logsrunning¦outputbody|<div class="markdown"><p>We recognize the <span class="tex">$&#40;\alpha_k&#41;$</span>&#39;s as prior pseudo-counts and the Dirichlet distribution shows to be a <a href="https://en.wikipedia.org/wiki/Conjugate_prior">conjugate prior</a> to the categorical/multinomial:</p>
<p class="tex">$$\begin&#123;align*&#125;
\underbrace&#123;\text&#123;Dirichlet&#125;&#125;_&#123;\text&#123;posterior&#125;&#125; &amp;\propto \underbrace&#123;\text&#123;categorical&#125;&#125;_&#123;\text&#123;likelihood&#125;&#125; \cdot \underbrace&#123;\text&#123;Dirichlet&#125;&#125;_&#123;\text&#123;prior&#125;&#125;
\end&#123;align*&#125;$$</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*Ϋpersist_js_state·has_pluto_hook_features§cell_id$d843b33c-d294-11ef-195d-2708fbfba49ddepends_on_disabled_cells§runtime published_object_keysdepends_on_skipped_cells§errored$d843efdc-d294-11ef-0f3a-630ecdd0aceequeued¤logsrunning¦outputbody|<div class="markdown"><p>A related distribution is the distribution over count observations <span class="tex">$D_m&#61;\&#123;m_1,\ldots,m_K\&#125;$</span>, which is called the <strong>multinomial distribution</strong>,</p>
<p class="tex">$$p&#40;D_m|\mu&#41; &#61;\frac&#123;N&#33;&#125;&#123;m_1&#33; m_2&#33;\ldots m_K&#33;&#125; \,\prod_k \mu_k^&#123;m_k&#125;\,.$$</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*u persist_js_state·has_pluto_hook_features§cell_id$d843efdc-d294-11ef-0f3a-630ecdd0aceedepends_on_disabled_cells§runtime r@published_object_keysdepends_on_skipped_cells§errored$acdc5bfa-7188-4a37-80e6-5026ecd1a813queued¤logsrunning¦outputbodyK<div>


<div class="key-concepts-repeat">
</div>

<script>
const wrapper = currentScript.parentElement
const repeat = wrapper.querySelector(".key-concepts-repeat")




	const render = () => {
		
	const nodes = [...document.querySelectorAll("div.key-concept")].filter(n => n.closest(".key-concepts-repeat") == null)
	
	repeat.innerHTML= ""
	nodes.forEach((n) => {
		const div = document.createElement("div")
		div.style = "position: relative;    margin-block-end: 2em;"
		
		const a = document.createElement("a")
		a.href = "#" + n.closest("pluto-cell").id
		a.innerText = "↑ Jump to source"
		a.style = `
			position: absolute;
		    box-shadow: 0 0 10px #0000001a;
		    font-family: system-ui;
		    font-weight: 600;
		    text-decoration: none;
		    display: inline-flex;
		    background: var(--white);
		    border-radius: 10rem;
		    padding: 0.2em 0.8em;
		    right: 5px;
		    top: 5.5px;
		`
		div.appendChild(a)
		div.appendChild(n.cloneNode(true))
		repeat.appendChild(div)
	})

	}








	// This mutationObserver code is from PlutoUI.TableOfContents. Check that code for the latest version with comments.
const invalidated = { current: false }
	const updateCallback = () => setTimeout(render, 1000)
	render()
updateCallback()
const notebook = wrapper.closest("pluto-notebook")
const mut_observers = { current: [] }
const createCellObservers = () => {
	mut_observers.current.forEach((o) => o.disconnect())
	mut_observers.current = Array.from(notebook.querySelectorAll("pluto-cell")).map(el => {
		const o = new MutationObserver(updateCallback)
		o.observe(el, {attributeFilter: ["class"]})
		return o
	})
}
createCellObservers()
const notebookObserver = new MutationObserver(() => {
	updateCallback()
	createCellObservers()
})
notebookObserver.observe(notebook, {childList: true})

const bodyClassObserver = new MutationObserver(updateCallback)
bodyClassObserver.observe(document.body, {attributeFilter: ["class"]})

invalidation.then(() => {
	invalidated.current = true
	notebookObserver.disconnect()
	bodyClassObserver.disconnect()
	mut_observers.current.forEach((o) => o.disconnect())
})
	
</script>

mimetext/htmlrootassigneelast_run_timestampA}*/persist_js_state·has_pluto_hook_features§cell_id$acdc5bfa-7188-4a37-80e6-5026ecd1a813depends_on_disabled_cells§runtime.޵published_object_keysdepends_on_skipped_cells§errored$d843540a-d294-11ef-3846-2bf27b7e9b30queued¤logsrunning¦outputbody<div class="markdown"><h1 id="Bayesian-Density-Estimation-for-a-Loaded-Die">Bayesian Density Estimation for a Loaded Die</h1>
<p>Now let&#39;s proceed with learning the parameters for a model for <span class="tex">$N$</span> independent-and-identically-distributed &#40;IID&#41; rolls of a <span class="tex">$K$</span>-sided die, based on observed data set <span class="tex">$D&#61;\&#123;x_1,\ldots,x_N\&#125;$</span>. </p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*persist_js_state·has_pluto_hook_features§cell_id$d843540a-d294-11ef-3846-2bf27b7e9b30depends_on_disabled_cells§runtime ﯵpublished_object_keysdepends_on_skipped_cells§errored$93b8ac65-ac41-4a03-bddd-5f01ccb5b42dqueued¤logsrunning¦outputbody<div class="markdown"><h4 id="Evidence-for-the-Multinomial-Dirichlet-model-&#40;**&#41;">Evidence for the Multinomial-Dirichlet model &#40;**&#41;</h4>
<p>As above, consider the following model assumptions for <span class="tex">$N$</span> tosses with a <span class="tex">$K$</span>-sided die with parameters <span class="tex">$\mu &#61; &#40;\mu_1,\mu_2, \ldots,\mu_K&#41;$</span>.  </p>
<p class="tex">$$\begin&#123;align&#125;
p&#40;D|\mu&#41; &amp;&#61; \prod_&#123;n&#61;1&#125;^N \mathrm&#123;Cat&#125;&#40;x_n|\mu&#41; &#61; \prod_&#123;k&#61;1&#125;^&#123;K&#125; \mu_k^&#123;m_k&#125; \tag&#123;likelihood&#125;\\
p&#40;\mu|\alpha&#41; &amp;&#61; \mathrm&#123;Dir&#125;&#40;\mu|\alpha&#41; &#61; \frac&#123;1&#125;&#123;B&#40;\alpha&#41;&#125; \prod_&#123;k&#61;1&#125;^&#123;K&#125; \mu_k^&#123;\alpha_k -1&#125;   \tag&#123;prior&#125;
\end&#123;align&#125;$$</p>
<p>where <span class="tex">$B&#40;\alpha&#41; &#61; \frac&#123;\prod_k \Gamma&#40;\alpha_k&#41;&#125;&#123;\Gamma&#40;\sum_k \alpha_k&#41;&#125;$</span> is known as the <a href="https://en.wikipedia.org/wiki/Beta_function">Beta function</a>.</p>
<p>Work out both the model evidence and the posterior distribution for <span class="tex">$\mu$</span>.</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*Gpersist_js_state·has_pluto_hook_features§cell_id$93b8ac65-ac41-4a03-bddd-5f01ccb5b42ddepends_on_disabled_cells§runtime 	/published_object_keysdepends_on_skipped_cells§errored$f9977fc0-0d3f-467e-822d-72f3a338f717queued¤logsrunning¦outputbody(<div class="markdown"><div class="admonition key-concept"><p class="admonition-title">🎯 Key concept</p><p><strong></strong></p>

<p>Discrete event outcomes are typically represented via one-hot encoding, in which each outcome corresponds to a unique binary indicator vector.</p>

</div>
</div>mimetext/htmlrootassigneelast_run_timestampA}*qְpersist_js_state·has_pluto_hook_features§cell_id$f9977fc0-0d3f-467e-822d-72f3a338f717depends_on_disabled_cells§runtime>kpublished_object_keysdepends_on_skipped_cells§errored$d843d0c4-d294-11ef-10b6-cb982615d58aqueued¤logsrunning¦outputbody<div class="markdown"><h2 id=""><span id='prediction-loaded-die'>Prediction of next toss for the loaded die</span></h2>
<p>Let&#39;s apply what we have learned about the loaded die to compute the probability that we throw the <span class="tex">$k$</span>-th face at the next toss. </p>
<p class="tex">$$\begin&#123;align*&#125;
p&#40;x_&#123;\bullet,k&#125;&#61;1|D&#41;  &amp;&#61; \int p&#40;x_&#123;\bullet,k&#125;&#61;1|\mu&#41;\,p&#40;\mu|D&#41; \,\mathrm&#123;d&#125;\mu \\
  &amp;&#61; \int_0^1 \mu_k \times  \mathcal&#123;Dir&#125;&#40;\mu|\,\alpha&#43;m&#41; \,\mathrm&#123;d&#125;\mu  \\
  &amp;&#61; \mathrm&#123;E&#125;\left&#91; \mu_k | D\right&#93; \\
  &amp;&#61; \frac&#123;m_k &#43; \alpha_k &#125;&#123; N&#43; \sum_k \alpha_k&#125;
\end&#123;align*&#125;$$</p>
<p>&#40;You can find the mean of the Dirichlet distribution <span class="tex">$\mathrm&#123;E&#125;\left&#91; \mu_k \right&#93;$</span> at its <a href="https://en.wikipedia.org/wiki/Dirichlet_distribution">Wikipedia site</a>&#41;. </p>
<p>This result is simply a generalization of <a href="https://en.wikipedia.org/wiki/Rule_of_succession"><strong>Laplace&#39;s rule of succession</strong></a>.</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*cpersist_js_state·has_pluto_hook_features§cell_id$d843d0c4-d294-11ef-10b6-cb982615d58adepends_on_disabled_cells§runtimeӇµpublished_object_keysdepends_on_skipped_cells§errored$d843a338-d294-11ef-2748-b95f2af1396bqueued¤logsrunning¦outputbody<div class="markdown"><h2 id="Inference-for-\&#123;\mu_k\&#125;">Inference for <span class="tex">$\&#123;\mu_k\&#125;$</span></h2>
<p>The posterior for  <span class="tex">$\&#123;\mu_k\&#125;$</span> can be obtained through Bayes rule:</p>
<p class="tex">$$\begin&#123;align*&#125;
p&#40;\mu|D,\alpha&#41; &amp;\propto p&#40;D|\mu&#41; \cdot p&#40;\mu|\alpha&#41; \\
  &amp;\propto  \prod_k \mu_k^&#123;m_k&#125; \cdot \prod_k \mu_k^&#123;\alpha_k-1&#125; \\
  &amp;&#61; \prod_k \mu_k^&#123;\alpha_k &#43; m_k -1&#125;\\
  &amp;\propto \mathrm&#123;Dir&#125;\left&#40;\mu\,|\,\alpha &#43; m \right&#41; \tag&#123;B-2.41&#125; \\
  &amp;&#61; \frac&#123;\Gamma\left&#40;\sum_k &#40;\alpha_k &#43; m_k&#41; \right&#41;&#125;&#123;\Gamma&#40;\alpha_1&#43;m_1&#41; \Gamma&#40;\alpha_2&#43;m_2&#41; \cdots \Gamma&#40;\alpha_K &#43; m_K&#41;&#125; \prod_&#123;k&#61;1&#125;^K \mu_k^&#123;\alpha_k &#43; m_k -1&#125;
\end&#123;align*&#125;$$</p>
<p>where <span class="tex">$m &#61; &#40;m_1,m_2,\ldots,m_K&#41;^T$</span> is the count vector.</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*persist_js_state·has_pluto_hook_features§cell_id$d843a338-d294-11ef-2748-b95f2af1396bdepends_on_disabled_cells§runtime 6published_object_keysdepends_on_skipped_cells§errored$d844fa76-d294-11ef-172a-85e68842c252queued¤logsrunning¦outputbodyo<div class="markdown"><p>Setting the derivative of <span class="tex">$\tilde&#123;\mathrm&#123;L&#125;&#125;&#40;\mu&#41;$</span> to zero yields the <strong>sample proportion</strong> for <span class="tex">$\mu_k$</span> </p>
<p class="tex">$$\begin&#123;equation*&#125;
\nabla_&#123;\mu_k&#125;   \tilde&#123;\mathrm&#123;L&#125;&#125;&#40;\mu&#41; &#61; \frac&#123;m_k &#125;
&#123;\hat\mu_k &#125; - \lambda  \overset&#123;&#33;&#125;&#123;&#61;&#125; 0 \; \Rightarrow \; \hat\mu_k &#61; \frac&#123;m_k &#125;&#123;N&#125;
\end&#123;equation*&#125;$$</p>
<p>where we get <span class="tex">$\lambda$</span> from the constraint </p>
<p class="tex">$$\begin&#123;equation*&#125;
\sum_k \hat \mu_k &#61; \sum_k \frac&#123;m_k&#125;
&#123;\lambda&#125; &#61; \frac&#123;N&#125;&#123;\lambda&#125; \overset&#123;&#33;&#125;&#123;&#61;&#125;  1
\end&#123;equation*&#125;$$</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*;persist_js_state·has_pluto_hook_features§cell_id$d844fa76-d294-11ef-172a-85e68842c252depends_on_disabled_cells§runtime published_object_keysdepends_on_skipped_cells§errored$d84422a6-d294-11ef-148b-c762a90cd620queued¤logsrunning¦outputbody<div class="markdown"><p>&#40;We insert this slide only to alert you to the difference between using one-hot encoded outcomes <span class="tex">$D&#61;\&#123;x_1,x_2,\ldots,x_N\&#125;$</span> as the data, versus using counts <span class="tex">$D_m &#61; \&#123;m_1,m_2,\ldots,m_K\&#125;$</span> as the data. When used as a likelihood function for <span class="tex">$\mu$</span>, it makes no difference whether you use <span class="tex">$p&#40;D|\mu&#41;$</span> or <span class="tex">$p&#40;D_m|\mu&#41;$</span>.&#41;</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*persist_js_state·has_pluto_hook_features§cell_id$d84422a6-d294-11ef-148b-c762a90cd620depends_on_disabled_cells§runtime BWpublished_object_keysdepends_on_skipped_cells§errored$d844bcfa-d294-11ef-0874-b154f3ed810bqueued¤logsrunning¦outputbody<div class="markdown"><h4 id=""><span id='ML-for-multinomial'>Maximum likelihood estimation by optimizing a constrained log-likelihood</span></h4>
<p>Of course, we shouldn&#39;t have to go through the full Bayesian framework to get the maximum likelihood estimate. Alternatively, we can find the maximum likelihood &#40;ML&#41; solution directly by optimizing the &#40;constrained&#41; log-likelihood.</p>
<p>The log-likelihood for the multinomial distribution is given by</p>
<p class="tex">$$\begin&#123;align*&#125;
\mathrm&#123;L&#125;&#40;\mu&#41; &amp;\triangleq \log p&#40;D_m|\mu&#41; \propto \log \prod_k \mu_k^&#123;m_k&#125; &#61;  \sum_k m_k \log \mu_k 
\end&#123;align*&#125;$$</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*.*persist_js_state·has_pluto_hook_features§cell_id$d844bcfa-d294-11ef-0874-b154f3ed810bdepends_on_disabled_cells§runtime  ӵpublished_object_keysdepends_on_skipped_cells§errored$81b5aea0-8101-46e2-a875-1058029ebf99queued¤logsrunning¦outputbody4<details>
	<summary>Click for the solution</summary>
	<div class="details-content">
		<plutoui-detail><div class="markdown"><p class="tex">$$\begin&#123;align&#125;
\overbrace&#123;\prod_&#123;k&#61;1&#125;^&#123;K&#125; \mu_k^&#123;m_k&#125;&#125;^&#123;\text&#123;likelihood &#125;p&#40;D|\mu&#41;&#125; \cdot \overbrace&#123;\frac&#123;1&#125;&#123;B&#40;\alpha&#41;&#125; \prod_&#123;k&#61;1&#125;^&#123;K&#125; \mu_k^&#123;\alpha_k -1&#125;&#125;^&#123;\text&#123;prior &#125;p&#40;\mu|\alpha&#41;&#125;  
&amp;&#61; \frac&#123;1&#125;&#123;B&#40;\alpha&#41;&#125; \prod_&#123;k&#61;1&#125;^&#123;K&#125; \mu_k^&#123;m_k &#43; \alpha_k -1&#125; \\
&amp;&#61; \frac&#123;B&#40;m&#43;\alpha&#41;&#125;&#123;B&#40;\alpha&#41;&#125; \frac&#123;1&#125;&#123;B&#40;m&#43;\alpha&#41;&#125;\prod_&#123;k&#61;1&#125;^&#123;K&#125; \mu_k^&#123;m_k &#43; \alpha_k -1&#125; \\
&amp;&#61; \underbrace&#123;\frac&#123;B&#40;m&#43;\alpha&#41;&#125;&#123;B&#40;\alpha&#41;&#125;&#125;_&#123;\text&#123;evidence &#125;p&#40;D|\alpha&#41;&#125; \,\underbrace&#123;\mathrm&#123;Dir&#125;&#40;\mu|m&#43;\alpha&#41;&#125;_&#123;\text&#123;posterior &#125;p&#40;\mu|D,\alpha&#41;&#125; 
\end&#123;align&#125; $$</p>
<p>This equation is the equivalent of the <a href="https://bmlip.github.io/course/lectures/The&#37;20Gaussian&#37;20Distribution.html#&#40;Multivariate&#41;-Gaussian-Multiplication">Gaussian multiplication formula</a> for discrete data. Note that the evidence is a scalar normalizer for given observations <span class="tex">$m$</span> and pseudo-observations &#40;&quot;prior&quot; observations&#41; <span class="tex">$\alpha$</span>.</p>
</div></plutoui-detail>
	</div>
</details>
<style type="text/css">
plutoui-detail {
	display: block;
	margin-block-end: var(--pluto-cell-spacing);
}

plutoui-detail:last-child {
	margin-block-end: 0;
}

pluto-output div.summary-title-outer {
	display: inline-flex;
	vertical-align: text-top;
	width: calc(100% - 1em);
	margin-left: -1em;
	padding-left: 1em;
}

pluto-output div.summary-title-outer > div.summary-title-inner {
	display: inline-block;
}
</style>

mimetext/htmlrootassigneelast_run_timestampA}*p_persist_js_state·has_pluto_hook_features§cell_id$81b5aea0-8101-46e2-a875-1058029ebf99depends_on_disabled_cells§runtime /published_object_keysdepends_on_skipped_cells§errored$d8439866-d294-11ef-230b-dfde21aedfbfqueued¤logsrunning¦outputbody_<div class="markdown"><h4 id="prior-distribution">prior distribution</h4>
<p>Next, we need a prior for the parameters <span class="tex">$\mu &#61; &#40;\mu_1,\mu_2,\ldots,\mu_K&#41;^T$</span>. </p>
<p>In the <a href="https://bmlip.github.io/course/lectures/Bayesian&#37;20Machine&#37;20Learning.html#beta-prior">binary coin toss example</a>, we used a <a href="https://en.wikipedia.org/wiki/Beta_distribution">beta distribution</a> that was conjugate with the binomial and forced us to choose prior pseudo-counts. </p>
<p>The generalization of the beta prior to <span class="tex">$K$</span> parameters <span class="tex">$\&#123;\mu_k\&#125;$</span> is the <a href="https://en.wikipedia.org/wiki/Dirichlet_distribution">Dirichlet distribution</a>:</p>
<p class="tex">$$p&#40;\mu|\alpha&#41; &#61; \mathrm&#123;Dir&#125;&#40;\mu|\alpha&#41; &#61; \frac&#123;\Gamma\left&#40;\sum_k \alpha_k\right&#41;&#125;&#123;\Gamma&#40;\alpha_1&#41;\cdots \Gamma&#40;\alpha_K&#41;&#125; \prod_&#123;k&#61;1&#125;^K \mu_k^&#123;\alpha_k-1&#125; $$</p>
<p>where <span class="tex">$\Gamma&#40;\cdot&#41;$</span> is the <a href="https://en.wikipedia.org/wiki/Gamma_function">Gamma function</a>. </p>
<ul>
<li><p>The Gamma function can be interpreted as a generalization of the factorial function to the real &#40;<span class="tex">$\mathbb&#123;R&#125;$</span>&#41; numbers. If <span class="tex">$n$</span> is a natural number &#40;<span class="tex">$1,2,3, \ldots &#36;&#41;, then &#36;\Gamma&#40;n&#41; &#61; &#40;n-1&#41;&#33;$</span>, where <span class="tex">$&#40;n-1&#41;&#33; &#61; &#40;n-1&#41;\cdot &#40;n-2&#41; \cdot 1$</span>.</p>
</li>
</ul>
<p>As before for the Beta distribution in the coin toss experiment, you can interpret <span class="tex">$\alpha_k$</span> as the prior number of &#40;pseudo-&#41;observations that the die landed on the  <span class="tex">$k$</span>-th face.</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*[[persist_js_state·has_pluto_hook_features§cell_id$d8439866-d294-11ef-230b-dfde21aedfbfdepends_on_disabled_cells§runtime $еpublished_object_keysdepends_on_skipped_cells§errored$d842fe4c-d294-11ef-15a9-a9a6e359f47dqueued¤logsrunning¦outputbody<div class="markdown"><h2 id="The-Categorical-Distribution">The Categorical Distribution</h2>
<p>Consider a toss with a <span class="tex">$K$</span>-sided die. We use a one-hot coding scheme, i.e., the outcome is encoded as </p>
<p class="tex">$$x_&#123;k&#125; &#61; \begin&#123;cases&#125; 1 &amp; \text&#123;if the throw landed on &#36;k&#36;-th face&#125;\\
0 &amp; \text&#123;otherwise&#125; \end&#123;cases&#125; \,.$$</p>
<p>Assume the probabilities</p>
<p class="tex">$$p&#40;x_&#123;k&#125;&#61;1&#41; &#61; \mu_k \quad \text&#123;with &#125; \mu_k \geq 0 \text&#123; and &#125;\sum_k \mu_k  &#61; 1 \,.$$</p>
<p>The data generating distribution for one-hot encoded outcome <span class="tex">$x &#61; &#40;x_&#123;1&#125;,x_&#123;2&#125;,\ldots,x_&#123;K&#125;&#41;$</span> &#40;and <span class="tex">$\mu &#61; &#40;\mu_1,\mu_2,\dots,\mu_k&#41;^T$</span>&#41; is then given by </p>
<p class="tex">$$p&#40;x|\mu&#41; &#61; \mu_1^&#123;x_1&#125; \mu_2^&#123;x_2&#125; \cdots \mu_K^&#123;x_K&#125;&#61;\prod_&#123;k&#61;1&#125;^K \mu_k^&#123;x_k&#125; \tag&#123;B-2.26&#125;$$</p>
<p>This generalized Bernoulli distribution is called the <a href="https://en.wikipedia.org/wiki/Categorical_distribution"><strong>categorical distribution</strong></a>.</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*,persist_js_state·has_pluto_hook_features§cell_id$d842fe4c-d294-11ef-15a9-a9a6e359f47ddepends_on_disabled_cells§runtime published_object_keysdepends_on_skipped_cells§errored$204bec3f-6fde-48c1-b2b6-9f88d484c130queued¤logsrunning¦outputbody<div class="markdown"><h1 id='Exercises-' class="ptt-section " style="--ptt-accent: yellow;"><span>Exercises</span> </h1>
	
<style>
.ptt-section::before {
	content: "";
	display: block;
	position: absolute;
	left: -25px;
	right: -6px;
	top: -4px;
	height: 200px;
	border: 4px solid salmon;
	border-bottom: none;
	border-image-source: linear-gradient(to bottom, var(--ptt-accent), transparent);
	border-image-slice: 1;
	opacity: .7;
	pointer-events: none;
}

.big.ptt-section::before {
	height: 500px;
}
	

.ptt-section > span {
	color: color-mix(in hwb, var(--ptt-accent) 60%, black);
	@media (prefers-color-scheme: dark) {
		color: color-mix(in hwb, var(--ptt-accent) 30%, white);
	}
	font-style: italic;
}

	
</style>

</div>mimetext/htmlrootassigneelast_run_timestampA}*<persist_js_state·has_pluto_hook_features§cell_id$204bec3f-6fde-48c1-b2b6-9f88d484c130depends_on_disabled_cells§runtimeٵpublished_object_keysdepends_on_skipped_cells§errored$d8443e38-d294-11ef-25db-b16df87850f4queued¤logsrunning¦outputbody<div class="markdown"><h4 id="Discrete-Distributions-&#40;*&#41;">Discrete Distributions &#40;*&#41;</h4>
<p>Show that</p>
<ul>
<li><p>&#40;a&#41; the categorial distribution is a special case of the multinomial for <span class="tex">$N&#61;1$</span>.  </p>
</li>
<li><p>&#40;b&#41; the Bernoulli is a special case of the categorial distribution for <span class="tex">$K&#61;2$</span>.    </p>
</li>
<li><p>&#40;c&#41; the binomial is a special case of the multinomial for <span class="tex">$K&#61;2$</span>.</p>
</li>
</ul>
</div>mimetext/htmlrootassigneelast_run_timestampA}*ɰpersist_js_state·has_pluto_hook_features§cell_id$d8443e38-d294-11ef-25db-b16df87850f4depends_on_disabled_cells§runtime 
rpublished_object_keysdepends_on_skipped_cells§errored$72f24b54-ab22-4a54-9ece-7433048f4769queued¤logsrunning¦outputbodyE<div class="markdown"><h4 id="Laplace&#39;s-Generalized-Rule-of-Succession-&#40;**&#41;">Laplace&#39;s Generalized Rule of Succession &#40;**&#41;</h4>
<p>Show that Laplace&#39;s generalized rule of succession can be worked out to a prediction that is composed of a prior prediction and data-based correction term.</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*persist_js_state·has_pluto_hook_features§cell_id$72f24b54-ab22-4a54-9ece-7433048f4769depends_on_disabled_cells§runtime UBpublished_object_keysdepends_on_skipped_cells§errored$448d0679-b47a-4db9-ad7d-a45786350fefqueued¤logsrunning¦outputbody	<details>
	<summary>Click for the solution</summary>
	<div class="details-content">
		<plutoui-detail><div class="markdown"><ul>
<li><p>&#40;a&#41; The probability mass function of a <strong>multinomial distribution</strong> is </p>
</li>
</ul>
<p class="tex">$$	p&#40;D_m|\mu&#41; &#61;\frac&#123;N&#33;&#125;&#123;m_1&#33; m_2&#33;\ldots m_K&#33;&#125; \,\prod_k \mu_k^&#123;m_k&#125;$$</p>
<p>over the data frequencies <span class="tex">$D_m&#61;\&#123;m_1,\ldots,m_K\&#125;$</span> with constraints that <span class="tex">$\sum_k \mu_k &#61; 1$</span> and <span class="tex">$\sum_k m_k&#61;N$</span>. </p>
<p>Setting <span class="tex">$N&#61;1$</span>, we see that <span class="tex">$p&#40;D_m|\mu&#41; \propto \prod_k \mu_k^&#123;m_k&#125;$</span> with <span class="tex">$\sum_k m_k&#61;1$</span>, making the sample-space one-hot coded. This is the <strong>categorical distribution</strong>.       </p>
<ul>
<li><p>&#40;b&#41; When <span class="tex">$K&#61;2$</span>, the constraint for the categorical distribution takes the form <span class="tex">$m_1&#61;1-m_2$</span> leading to </p>
</li>
</ul>
<p class="tex">$$	p&#40;D_m|\mu&#41; \propto \mu_1^&#123;m_1&#125;&#40;1-\mu_1&#41;^&#123;1-m_1&#125;$$</p>
<p>which is associated with the <strong>Bernoulli distribution</strong>.       </p>
<ul>
<li><p>&#40;c&#41; Plugging <span class="tex">$K&#61;2$</span> into the multinomial distribution leads to <span class="tex">$p&#40;D_m|\mu&#41; &#61;\frac&#123;N&#33;&#125;&#123;m_1&#33; m_2&#33;&#125;\mu_1^&#123;m_1&#125;\left&#40;\mu_2^&#123;m_2&#125;\right&#41;$</span> with the constraints <span class="tex">$m_1&#43;m_2&#61;N$</span> and <span class="tex">$\mu_1&#43;\mu_2&#61;1$</span>. Then plugging the constraints back in we obtain </p>
</li>
</ul>
<p class="tex">$$	p&#40;D_m|\mu&#41; &#61; \frac&#123;N&#33;&#125;&#123;m_1&#33; &#40;N-m1&#41;&#33;&#125;\mu_1^&#123;m_1&#125;\left&#40;1-\mu_1\right&#41;^&#123;N-m_1&#125;$$</p>
<p>which is the <strong>binomial distribution</strong>.</p>
</div></plutoui-detail>
	</div>
</details>
<style type="text/css">
plutoui-detail {
	display: block;
	margin-block-end: var(--pluto-cell-spacing);
}

plutoui-detail:last-child {
	margin-block-end: 0;
}

pluto-output div.summary-title-outer {
	display: inline-flex;
	vertical-align: text-top;
	width: calc(100% - 1em);
	margin-left: -1em;
	padding-left: 1em;
}

pluto-output div.summary-title-outer > div.summary-title-inner {
	display: inline-block;
}
</style>

mimetext/htmlrootassigneelast_run_timestampA}*pDkpersist_js_state·has_pluto_hook_features§cell_id$448d0679-b47a-4db9-ad7d-a45786350fefdepends_on_disabled_cells§runtime'	2Lpublished_object_keysdepends_on_skipped_cells§errored$01c4c590-fece-49a5-8979-6e0d54f7850aqueued¤logsrunning¦outputbody}<details>
	<summary>Click for the solution</summary>
	<div class="details-content">
		<plutoui-detail><div class="markdown"><p>Derivations are in the lecture notes.        </p>
<ul>
<li><p>&#40;a&#41;</p>
</li>
</ul>
<p class="tex">$$p&#40;x_n|\mu&#41; &#61; \prod_k \mu_k^&#123;x_&#123;nk&#125;&#125; \quad \text&#123;subject to&#125; \quad \sum_k \mu_k &#61; 1 \,.$$</p>
<p class="tex">$$p&#40;D|\mu&#41;  &#61; \sum_k m_k \log \mu_k$$</p>
<p>where <span class="tex">$m_k &#61; \sum_n x_&#123;nk&#125;$</span>.       </p>
<ul>
<li><p>&#40;b&#41;</p>
</li>
</ul>
<p class="tex">$$\hat \mu &#61; \frac&#123;m_k&#125;&#123;N&#125;\,,$$</p>
<p>which is the <em>sample proportion</em>.</p>
</div></plutoui-detail>
	</div>
</details>
<style type="text/css">
plutoui-detail {
	display: block;
	margin-block-end: var(--pluto-cell-spacing);
}

plutoui-detail:last-child {
	margin-block-end: 0;
}

pluto-output div.summary-title-outer {
	display: inline-flex;
	vertical-align: text-top;
	width: calc(100% - 1em);
	margin-left: -1em;
	padding-left: 1em;
}

pluto-output div.summary-title-outer > div.summary-title-inner {
	display: inline-block;
}
</style>

mimetext/htmlrootassigneelast_run_timestampA}*FApersist_js_state·has_pluto_hook_features§cell_id$01c4c590-fece-49a5-8979-6e0d54f7850adepends_on_disabled_cells§runtimeHcpublished_object_keysdepends_on_skipped_cells§errored$d8424e52-d294-11ef-0083-fbb77df4d853queued¤logsrunning¦outputbodyL<div class="markdown"><h2 id="Preliminaries">Preliminaries</h2>
<h5 id="Goal">Goal</h5>
<ul>
<li><p>Simple Bayesian and maximum likelihood-based density estimation for discretely valued data sets</p>
</li>
</ul>
<h5 id="Materials">Materials</h5>
<ul>
<li><p>Mandatory</p>
<ul>
<li><p>These lecture notes</p>
</li>
</ul>
</li>
<li><p>Optional</p>
<ul>
<li><p><a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf">Bishop PRML book</a> &#40;2006&#41;, pp. 67-70, 74-76, 93-94</p>
</li>
</ul>
</li>
</ul>
</div>mimetext/htmlrootassigneelast_run_timestampA}*5persist_js_state·has_pluto_hook_features§cell_id$d8424e52-d294-11ef-0083-fbb77df4d853depends_on_disabled_cells§runtime /published_object_keysdepends_on_skipped_cells§errored$d844d564-d294-11ef-0454-416352d43524queued¤logsrunning¦outputbody<div class="markdown"><p>When doing ML estimation, we must obey the constraint <span class="tex">$\sum_k \mu_k  &#61; 1$</span>, which can be accomplished by a <a href="https://en.wikipedia.org/wiki/Lagrange_multiplier">Lagrange multiplier</a>. The <strong>constrained log-likelihood</strong> with Lagrange multiplier is then</p>
<p class="tex">$$\tilde&#123;\mathrm&#123;L&#125;&#125;&#40;\mu&#41; &#61; \sum_k m_k \log \mu_k  &#43; \lambda \cdot \big&#40;1 - \sum_k \mu_k \big&#41;$$</p>
<p>The method of Lagrange multipliers is a mathematical method for transforming a constrained optimization problem to an unconstrained optimization problem &#40;see <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf#page&#61;727">Bishop App.E</a>&#41;. Unconstrained optimization problems can be solved by setting the derivative to zero. </p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*	persist_js_state·has_pluto_hook_features§cell_id$d844d564-d294-11ef-0454-416352d43524depends_on_disabled_cells§runtime _published_object_keysdepends_on_skipped_cells§errored$d84369a4-d294-11ef-38f7-7f393869b705queued¤logsrunning¦outputbody<div class="markdown"><h2 id="Model-specification">Model specification</h2>
<h4 id="data-generating-distribution">data-generating distribution</h4>
<p>The outcomes <span class="tex">$x_n$</span> are encoded as</p>
<p class="tex">$$x_&#123;nk&#125; &#61; \begin&#123;cases&#125; 1 &amp; \text&#123;if the &#36;n&#36;-th throw landed on &#36;k&#36;-th face&#125;\\
0 &amp; \text&#123;otherwise&#125; \end&#123;cases&#125;$$</p>
<p>and the likelihood function for <span class="tex">$\mu$</span> is now</p>
<p class="tex">$$p&#40;D|\mu&#41; &#61; \prod_n \prod_k \mu_k^&#123;x_&#123;nk&#125;&#125; &#61; \prod_k \mu_k^&#123;\sum_n x_&#123;nk&#125;&#125; &#61; \prod_k \mu_k^&#123;m_k&#125; \tag&#123;B-2.29&#125;$$</p>
<p>where <span class="tex">$m_k&#61; \sum_n x_&#123;nk&#125;$</span> is the total number of occurrences that the outcome landed on face <span class="tex">$k$</span>. The vector <span class="tex">$m &#61; &#40;m_1,m_2, \ldots, m_K&#41;^T$</span> is known as the <strong>count vector</strong>. Note that <span class="tex">$\sum_k m_k &#61; N$</span>.</p>
<p>This distribution depends on the observations <strong>only</strong> through the &#39;&#39;observed&#39;&#39; counts <span class="tex">$\&#123;m_k\&#125;$</span>. For given counts <span class="tex">$\&#123;m_k\&#125;$</span>, <span class="tex">$p&#40;D|\mu&#41;$</span> can be interpreted as a likelihood function for <span class="tex">$\mu$</span>.</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*3persist_js_state·has_pluto_hook_features§cell_id$d84369a4-d294-11ef-38f7-7f393869b705depends_on_disabled_cells§runtime published_object_keysdepends_on_skipped_cells§errored$3c2ee96d-18a6-45d0-a2cf-f2ebbf5e22f0queued¤logsrunning¦outputbodyD<details>
	<summary>Click for the solution</summary>
	<div class="details-content">
		<plutoui-detail><div class="markdown"><p class="tex">$$\begin&#123;align*&#125;
p&#40;&amp;x_&#123;\bullet,k&#125;&#61;1|D&#41; &#61; \frac&#123;m_k &#43; \alpha_k &#125;&#123; N&#43; \sum_k \alpha_k&#125; \\
&amp;&#61; \frac&#123;m_k&#125;&#123;N&#43;\sum_k \alpha_k&#125;  &#43; \frac&#123;\alpha_k&#125;&#123;N&#43;\sum_k \alpha_k&#125;\\
&amp;&#61; \frac&#123;m_k&#125;&#123;N&#43;\sum_k \alpha_k&#125; \cdot \frac&#123;N&#125;&#123;N&#125; &#43; \frac&#123;\alpha_k&#125;&#123;N&#43;\sum_k \alpha_k&#125;\cdot \frac&#123;\sum_k \alpha_k&#125;&#123;\sum_k\alpha_k&#125; \\
&amp;&#61; \frac&#123;N&#125;&#123;N&#43;\sum_k \alpha_k&#125; \cdot \frac&#123;m_k&#125;&#123;N&#125; &#43; \frac&#123;\sum_k \alpha_k&#125;&#123;N&#43;\sum_k \alpha_k&#125; \cdot \frac&#123;\alpha_k&#125;&#123;\sum_k\alpha_k&#125; \\
&amp;&#61; \frac&#123;N&#125;&#123;N&#43;\sum_k \alpha_k&#125; \cdot \frac&#123;m_k&#125;&#123;N&#125; &#43; \bigg&#40; \frac&#123;\sum_k \alpha_k&#125;&#123;N&#43;\sum_k \alpha_k&#125; &#43; \underbrace&#123;\frac&#123;N&#125;&#123;N&#43;\sum_k \alpha_k&#125; - \frac&#123;N&#125;&#123;N&#43;\sum_k \alpha_k&#125;&#125;_&#123;0&#125;\bigg&#41; \cdot \frac&#123;\alpha_k&#125;&#123;\sum_k\alpha_k&#125; \\
&amp;&#61; \frac&#123;N&#125;&#123;N&#43;\sum_k \alpha_k&#125; \cdot \frac&#123;m_k&#125;&#123;N&#125; &#43; \bigg&#40; 1 - \frac&#123;N&#125;&#123;N&#43;\sum_k \alpha_k&#125;\bigg&#41; \cdot \frac&#123;\alpha_k&#125;&#123;\sum_k\alpha_k&#125; \\
&amp;&#61; \underbrace&#123;\frac&#123;\alpha_k&#125;&#123;\sum_k\alpha_k&#125;&#125;_&#123;\text&#123;prior prediction&#125;&#125; &#43; \underbrace&#123;\frac&#123;N&#125;&#123;N&#43;\sum_k \alpha_k&#125; \cdot \underbrace&#123;\left&#40;\frac&#123;m_k&#125;&#123;N&#125; - \frac&#123;\alpha_k&#125;&#123;\sum_k\alpha_k&#125;\right&#41;&#125;_&#123;\text&#123;prediction error&#125;&#125;&#125;_&#123;\text&#123;data-based correction&#125;&#125;
\end&#123;align*&#125;$$</p>
<p>&#40;If you know how to do it shorter and more elegantly, please post in Piazza.&#41;</p>
<p>This decomposition is the natural consequence of doing Bayesian estimation, which always involves a prior-based prediction term and a likelihood-based &#40;or data-based&#41; correction term that can be interpreted as a &#40;precision-weighted&#41; prediction error. </p>
<pre><code></code></pre>
</div></plutoui-detail>
	</div>
</details>
<style type="text/css">
plutoui-detail {
	display: block;
	margin-block-end: var(--pluto-cell-spacing);
}

plutoui-detail:last-child {
	margin-block-end: 0;
}

pluto-output div.summary-title-outer {
	display: inline-flex;
	vertical-align: text-top;
	width: calc(100% - 1em);
	margin-left: -1em;
	padding-left: 1em;
}

pluto-output div.summary-title-outer > div.summary-title-inner {
	display: inline-block;
}
</style>

mimetext/htmlrootassigneelast_run_timestampA}*pozpersist_js_state·has_pluto_hook_features§cell_id$3c2ee96d-18a6-45d0-a2cf-f2ebbf5e22f0depends_on_disabled_cells§runtime published_object_keysdepends_on_skipped_cells§errored$59fb1e66-cf05-4f2b-8027-7ff3b1a57c15queued¤logsrunning¦outputbody4<div class="markdown"><h1 id="Code">Code</h1>
</div>mimetext/htmlrootassigneelast_run_timestampA}*}persist_js_state·has_pluto_hook_features§cell_id$59fb1e66-cf05-4f2b-8027-7ff3b1a57c15depends_on_disabled_cells§runtime .published_object_keysdepends_on_skipped_cells§errored$d3a4a1dc-3fdf-479d-a51c-a1e23073c556queued¤logsrunning¦outputbodymimetext/plainrootassigneelast_run_timestampA}*persist_js_state·has_pluto_hook_features§cell_id$d3a4a1dc-3fdf-479d-a51c-a1e23073c556depends_on_disabled_cells§runtimerpublished_object_keysdepends_on_skipped_cells§errored$62b42d1d-be91-4740-bac6-b4527494959dqueued¤logsrunning¦outputbodyj<div class="markdown"><h4 id="Maximum-Likelihood-estimation-&#40;**&#41;">Maximum Likelihood estimation &#40;**&#41;</h4>
<p>We consider IID data <span class="tex">$D &#61; \&#123;x_1,x_2,\ldots,x_N\&#125;$</span> obtained from tossing a <span class="tex">$K$</span>-sided die. We use a <em>binary selection variable</em></p>
<p class="tex">$$x_&#123;nk&#125; \equiv \begin&#123;cases&#125; 1 &amp; \text&#123;if &#36;x_n&#36; lands on &#36;k&#36;-th face&#125;\\
    0 &amp; \text&#123;otherwise&#125;
\end&#123;cases&#125;$$</p>
<p>with probabilities <span class="tex">$p&#40;x_&#123;nk&#125; &#61; 1&#41;&#61;\mu_k$</span>.         </p>
<ul>
<li><p>&#40;a&#41; Derive the log-likelihood <span class="tex">$\log p&#40;D|\mu&#41;$</span>.        </p>
</li>
<li><p>&#40;b&#41; Derive the maximum likelihood estimate for <span class="tex">$\mu$</span>.</p>
</li>
</ul>
</div>mimetext/htmlrootassigneelast_run_timestampA}*persist_js_state·has_pluto_hook_features§cell_id$62b42d1d-be91-4740-bac6-b4527494959ddepends_on_disabled_cells§runtime `published_object_keysdepends_on_skipped_cells§errored$d843c228-d294-11ef-0d34-3520dc97859cqueued¤logsrunning¦outputbody<div class="markdown"><p>This is actually a generalization of the conjugate relation that we found for the binary coin toss: </p>
<p class="tex">$$\begin&#123;align*&#125;
\underbrace&#123;\text&#123;beta&#125;&#125;_&#123;\text&#123;posterior&#125;&#125; &amp;\propto \underbrace&#123;\text&#123;binomial&#125;&#125;_&#123;\text&#123;likelihood&#125;&#125; \cdot \underbrace&#123;\text&#123;beta&#125;&#125;_&#123;\text&#123;prior&#125;&#125;
\end&#123;align*&#125;$$</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*ڰpersist_js_state·has_pluto_hook_features§cell_id$d843c228-d294-11ef-0d34-3520dc97859cdepends_on_disabled_cells§runtime lspublished_object_keysdepends_on_skipped_cells§errored$63cc56b7-588a-43c3-8327-ad6367608601queued¤logsrunning¦outputbody:<div class="markdown"><h1 id="Summary">Summary</h1>
</div>mimetext/htmlrootassigneelast_run_timestampA}*`persist_js_state·has_pluto_hook_features§cell_id$63cc56b7-588a-43c3-8327-ad6367608601depends_on_disabled_cells§runtime Fpublished_object_keysdepends_on_skipped_cells§errored$d843defc-d294-11ef-358b-f56f514dcf93queued¤logsrunning¦outputbody<div class="markdown"><h2 id="Categorical,-Multinomial-and-Related-Distributions">Categorical, Multinomial and Related Distributions</h2>
<p>In the above derivation, we noticed that the data generating distribution for <span class="tex">$N$</span> die tosses with data outcomes <span class="tex">$D&#61;\&#123;x_1,\ldots,x_N\&#125;$</span> only depends on the <strong>counts</strong> <span class="tex">$m_k$</span>:</p>
<p class="tex">$$p&#40;D|\mu&#41; &#61; \prod_n \underbrace&#123;\prod_k \mu_k^&#123;x_&#123;nk&#125;&#125;&#125;_&#123;\text&#123;categorical dist.&#125;&#125; &#61; \prod_k \mu_k^&#123;\sum_n x_&#123;nk&#125;&#125; &#61; \prod_k \mu_k^&#123;m_k&#125; \tag&#123;B-2.29&#125;$$</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*Jpersist_js_state·has_pluto_hook_features§cell_id$d843defc-d294-11ef-358b-f56f514dcf93depends_on_disabled_cells§runtime -published_object_keysdepends_on_skipped_cells§errored$d842d368-d294-11ef-024d-45e58ca994e0queued¤logsrunning¦outputbody<div class="markdown"><p>Now consider a <span class="tex">$K$</span>-sided coin &#40;e.g., a six-faced <em>die</em> &#40;pl.: dice&#41;&#41;. How should we encode outcomes? Two natural options present themselves:</p>
<h5 id="Option-1:-label-encoding">Option 1: label encoding</h5>
<p class="tex">$$x \in \&#123;1,2,\ldots,K\&#125; \,.$$</p>
<ul>
<li><p>E.g., for <span class="tex">$K&#61;6$</span>, if the die lands on the 3rd face, then <span class="tex">$x&#61;3$</span>.</p>
</li>
<li><p>This coding scheme is called <strong>label</strong> &#40;or <strong>index</strong>&#41; encoding. </p>
</li>
</ul>
<h5 id="Option-2:-one-hot-encoding">Option 2: one-hot encoding</h5>
<p class="tex">$$x &#61; &#40;x_1,\ldots,x_K&#41;^T $$</p>
<p>where <span class="tex">$x_k$</span> are <strong>binary selection variables</strong>, given by</p>
<p class="tex">$$x_k &#61; \begin&#123;cases&#125; 1 &amp; \text&#123;if die landed on &#36;k&#36;th face&#125;\\
0 &amp; \text&#123;otherwise&#125; \end&#123;cases&#125;$$</p>
<ul>
<li><p>For instance, for <span class="tex">$K&#61;6$</span>, if the die lands on the <span class="tex">$3$</span>-rd face, then <span class="tex">$x&#61;&#40;0,0,1,0,0,0&#41;^T$</span>.</p>
</li>
<li><p>This coding scheme is called a <strong>1-of-K</strong> or <strong>one-hot</strong> coding scheme.</p>
</li>
</ul>
<p>It turns out that the one-hot coding scheme is mathematically more convenient&#33;</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*`persist_js_state·has_pluto_hook_features§cell_id$d842d368-d294-11ef-024d-45e58ca994e0depends_on_disabled_cells§runtime published_object_keysdepends_on_skipped_cells§errored$4482e857-af6b-4459-a0a2-cd7ad57ed94fqueued¤logsrunning¦outputbody<details>
	<summary>Click to see proof</summary>
	<div class="details-content">
		<plutoui-detail><div class="markdown"><p class="tex">$$\begin&#123;align*&#125;
\hat&#123;\mu&#125;_k &amp;&#61; \arg\max_&#123;\mu_k&#125; p&#40;D|\mu&#41; \\
&amp;&#61; \arg\max_&#123;\mu_k&#125; p&#40;D|\mu&#41; \cdot \underbrace&#123;\left.\mathrm&#123;Dir&#125;&#40;\mu|\alpha&#41;\right|_&#123;\alpha&#61;&#40;1,1,\ldots,1&#41;&#125;&#125;_&#123;\text&#123;uniform distr.&#125;&#125; \\
&amp;&#61; \arg\max_&#123;\mu_k&#125; \left.p&#40;\mu|D,\alpha&#41;\right|_&#123;\alpha&#61;&#40;1,1,\ldots,1&#41;&#125;  \\
&amp;&#61; \arg\max_&#123;\mu_k&#125; \left.\mathrm&#123;Dir&#125;\left&#40; \mu | m &#43; \alpha \right&#41;\right|_&#123;\alpha&#61;&#40;1,1,\ldots,1&#41;&#125; \\
&amp;&#61; \frac&#123;m_k&#125;&#123;\sum_k m_k&#125; &#61; \frac&#123;m_k&#125;&#123;N&#125;
\end&#123;align*&#125;$$</p>
<p>where we used the fact that the <a href="https://en.wikipedia.org/wiki/Dirichlet_distribution#Mode">maximum of the Dirichlet distribution</a> <span class="tex">$\mathrm&#123;Dir&#125;&#40;\&#123;\alpha_1,\ldots,\alpha_K\&#125;&#41;$</span> is obtained at  <span class="tex">$&#40;\alpha_k-1&#41;/&#40;\sum_k\alpha_k - K&#41;$</span>.</p>
<pre><code></code></pre>
</div></plutoui-detail>
	</div>
</details>
<style type="text/css">
plutoui-detail {
	display: block;
	margin-block-end: var(--pluto-cell-spacing);
}

plutoui-detail:last-child {
	margin-block-end: 0;
}

pluto-output div.summary-title-outer {
	display: inline-flex;
	vertical-align: text-top;
	width: calc(100% - 1em);
	margin-left: -1em;
	padding-left: 1em;
}

pluto-output div.summary-title-outer > div.summary-title-inner {
	display: inline-block;
}
</style>

mimetext/htmlrootassigneelast_run_timestampA}*,persist_js_state·has_pluto_hook_features§cell_id$4482e857-af6b-4459-a0a2-cd7ad57ed94fdepends_on_disabled_cells§runtimemPpublished_object_keysdepends_on_skipped_cells§errored$1c6d16be-e8e8-45f1-aa32-c3fb08af19cequeued¤logsrunning¦outputbodyP<script>
	
// Load the library for consistent smooth scrolling
const {default: scrollIntoView} = await import("data:text/javascript;base64,dmFyIFE9ZT0+Im9iamVjdCI9PXR5cGVvZiBlJiZudWxsIT1lJiYxPT09ZS5ub2RlVHlwZSxVPShlLHQpPT4oIXR8fCJoaWRkZW4iIT09ZSkmJiJ2aXNpYmxlIiE9PWUmJiJjbGlwIiE9PWUsQT0oZSx0KT0+e2lmKGUuY2xpZW50SGVpZ2h0PGUuc2Nyb2xsSGVpZ2h0fHxlLmNsaWVudFdpZHRoPGUuc2Nyb2xsV2lkdGgpe2xldCBsPWdldENvbXB1dGVkU3R5bGUoZSxudWxsKTtyZXR1cm4gVShsLm92ZXJmbG93WSx0KXx8VShsLm92ZXJmbG93WCx0KXx8KGU9PntsZXQgdD0oZT0+e2lmKCFlLm93bmVyRG9jdW1lbnR8fCFlLm93bmVyRG9jdW1lbnQuZGVmYXVsdFZpZXcpcmV0dXJuIG51bGw7dHJ5e3JldHVybiBlLm93bmVyRG9jdW1lbnQuZGVmYXVsdFZpZXcuZnJhbWVFbGVtZW50fWNhdGNoe3JldHVybiBudWxsfX0pKGUpO3JldHVybiEhdCYmKHQuY2xpZW50SGVpZ2h0PGUuc2Nyb2xsSGVpZ2h0fHx0LmNsaWVudFdpZHRoPGUuc2Nyb2xsV2lkdGgpfSkoZSl9cmV0dXJuITF9LFg9KGUsdCxsLG8sbixyLGkscyk9PnI8ZSYmaT50fHxyPmUmJmk8dD8wOnI8PWUmJnM8PWx8fGk+PXQmJnM+PWw/ci1lLW86aT50JiZzPGx8fHI8ZSYmcz5sP2ktdCtuOjAsJD1lPT5lLnBhcmVudEVsZW1lbnQ/PyhlLmdldFJvb3ROb2RlKCkuaG9zdHx8bnVsbCksdHQ9KGUsdCk9Pnt2YXIgbCxvLG4scjtpZih0eXBlb2YgZG9jdW1lbnQ+InUiKXJldHVybltdO2xldHtzY3JvbGxNb2RlOmksYmxvY2s6cyxpbmxpbmU6YSxib3VuZGFyeTpoLHNraXBPdmVyZmxvd0hpZGRlbkVsZW1lbnRzOnV9PXQsZz0iZnVuY3Rpb24iPT10eXBlb2YgaD9oOmU9PmUhPT1oO2lmKCFRKGUpKXRocm93IFR5cGVFcnJvcigiSW52YWxpZCB0YXJnZXQiKTtsZXQgdj1kb2N1bWVudC5zY3JvbGxpbmdFbGVtZW50fHxkb2N1bWVudC5kb2N1bWVudEVsZW1lbnQsbT1bXSx3PWU7Zm9yKDtRKHcpJiZnKHcpOyl7aWYoKHc9JCh3KSk9PT12KXttLnB1c2godyk7YnJlYWt9bnVsbCE9dyYmdz09PWRvY3VtZW50LmJvZHkmJkEodykmJiFBKGRvY3VtZW50LmRvY3VtZW50RWxlbWVudCl8fG51bGwhPXcmJkEodyx1KSYmbS5wdXNoKHcpfWxldCBXPW51bGwhPShvPW51bGw9PShsPXdpbmRvdy52aXN1YWxWaWV3cG9ydCk/dm9pZCAwOmwud2lkdGgpP286aW5uZXJXaWR0aCxIPW51bGwhPShyPW51bGw9PShuPXdpbmRvdy52aXN1YWxWaWV3cG9ydCk/dm9pZCAwOm4uaGVpZ2h0KT9yOmlubmVySGVpZ2h0LHtzY3JvbGxYOl8sc2Nyb2xsWTp4fT13aW5kb3cse2hlaWdodDpFLHdpZHRoOlQsdG9wOk4scmlnaHQ6TCxib3R0b206WSxsZWZ0OkN9PWUuZ2V0Qm91bmRpbmdDbGllbnRSZWN0KCksUj0ic3RhcnQiPT09c3x8Im5lYXJlc3QiPT09cz9OOiJlbmQiPT09cz9ZOk4rRS8yLFY9ImNlbnRlciI9PT1hP0MrVC8yOiJlbmQiPT09YT9MOkMsQj1bXTtmb3IobGV0IEQ9MDtEPG0ubGVuZ3RoO0QrKyl7bGV0IE89bVtEXSx7aGVpZ2h0Omosd2lkdGg6SSx0b3A6UyxyaWdodDpxLGJvdHRvbTp6LGxlZnQ6Rn09Ty5nZXRCb3VuZGluZ0NsaWVudFJlY3QoKTtpZigiaWYtbmVlZGVkIj09PWkmJk4+PTAmJkM+PTAmJlk8PUgmJkw8PVcmJk4+PVMmJlk8PXomJkM+PUYmJkw8PXEpYnJlYWs7bGV0IEc9Z2V0Q29tcHV0ZWRTdHlsZShPKSxKPXBhcnNlSW50KEcuYm9yZGVyTGVmdFdpZHRoLDEwKSxLPXBhcnNlSW50KEcuYm9yZGVyVG9wV2lkdGgsMTApLFA9cGFyc2VJbnQoRy5ib3JkZXJSaWdodFdpZHRoLDEwKSxaPXBhcnNlSW50KEcuYm9yZGVyQm90dG9tV2lkdGgsMTApLGVlPTAsZXQ9MCxlbD0ib2Zmc2V0V2lkdGgiaW4gTz9PLm9mZnNldFdpZHRoLU8uY2xpZW50V2lkdGgtSi1QOjAsZW89Im9mZnNldEhlaWdodCJpbiBPP08ub2Zmc2V0SGVpZ2h0LU8uY2xpZW50SGVpZ2h0LUstWjowLGVuPSJvZmZzZXRXaWR0aCJpbiBPPzA9PT1PLm9mZnNldFdpZHRoPzA6SS9PLm9mZnNldFdpZHRoOjAsZXI9Im9mZnNldEhlaWdodCJpbiBPPzA9PT1PLm9mZnNldEhlaWdodD8wOmovTy5vZmZzZXRIZWlnaHQ6MDtpZih2PT09TyllZT0ic3RhcnQiPT09cz9SOiJlbmQiPT09cz9SLUg6Im5lYXJlc3QiPT09cz9YKHgseCtILEgsSyxaLHgrUix4K1IrRSxFKTpSLUgvMixldD0ic3RhcnQiPT09YT9WOiJjZW50ZXIiPT09YT9WLVcvMjoiZW5kIj09PWE/Vi1XOlgoXyxfK1csVyxKLFAsXytWLF8rVitULFQpLGVlPU1hdGgubWF4KDAsZWUreCksZXQ9TWF0aC5tYXgoMCxldCtfKTtlbHNle2VlPSJzdGFydCI9PT1zP1ItUy1LOiJlbmQiPT09cz9SLXorWitlbzoibmVhcmVzdCI9PT1zP1goUyx6LGosSyxaK2VvLFIsUitFLEUpOlItKFMrai8yKStlby8yLGV0PSJzdGFydCI9PT1hP1YtRi1KOiJjZW50ZXIiPT09YT9WLShGK0kvMikrZWwvMjoiZW5kIj09PWE/Vi1xK1ArZWw6WChGLHEsSSxKLFArZWwsVixWK1QsVCk7bGV0e3Njcm9sbExlZnQ6ZWksc2Nyb2xsVG9wOmVkfT1PO2VlPU1hdGgubWF4KDAsTWF0aC5taW4oZWQrZWUvZXIsTy5zY3JvbGxIZWlnaHQtai9lcitlbykpLGV0PU1hdGgubWF4KDAsTWF0aC5taW4oZWkrZXQvZW4sTy5zY3JvbGxXaWR0aC1JL2VuK2VsKSksUis9ZWQtZWUsVis9ZWktZXR9Qi5wdXNoKHtlbDpPLHRvcDplZSxsZWZ0OmV0fSl9cmV0dXJuIEJ9LGY9ZT0+e3ZhciB0O3JldHVybiExPT09ZT97YmxvY2s6ImVuZCIsaW5saW5lOiJuZWFyZXN0In06KHQ9ZSk9PT1PYmplY3QodCkmJjAhPT1PYmplY3Qua2V5cyh0KS5sZW5ndGg/ZTp7YmxvY2s6InN0YXJ0IixpbmxpbmU6Im5lYXJlc3QifX07ZnVuY3Rpb24gYyhlLHQpe3ZhciBsO2lmKCFlLmlzQ29ubmVjdGVkfHwhKGU9PntsZXQgdD1lO2Zvcig7dCYmdC5wYXJlbnROb2RlOyl7aWYodC5wYXJlbnROb2RlPT09ZG9jdW1lbnQpcmV0dXJuITA7dD10LnBhcmVudE5vZGUgaW5zdGFuY2VvZiBTaGFkb3dSb290P3QucGFyZW50Tm9kZS5ob3N0OnQucGFyZW50Tm9kZX1yZXR1cm4hMX0pKGUpKXJldHVybjtpZigib2JqZWN0Ij09dHlwZW9mKGw9dCkmJiJmdW5jdGlvbiI9PXR5cGVvZiBsLmJlaGF2aW9yKXJldHVybiB0LmJlaGF2aW9yKHR0KGUsdCkpO2xldCBvPSJib29sZWFuIj09dHlwZW9mIHR8fG51bGw9PXQ/dm9pZCAwOnQuYmVoYXZpb3I7Zm9yKGxldHtlbDpuLHRvcDpyLGxlZnQ6aX1vZiB0dChlLGYodCkpKW4uc2Nyb2xsKHt0b3A6cixsZWZ0OmksYmVoYXZpb3I6b30pfXZhciBkLHA9KCk9PihkfHwoZD0icGVyZm9ybWFuY2UiaW4gd2luZG93P3BlcmZvcm1hbmNlLm5vdy5iaW5kKHBlcmZvcm1hbmNlKTpEYXRlLm5vdyksZCgpKTtmdW5jdGlvbiBiKGUpe2xldCB0PU1hdGgubWluKChwKCktZS5zdGFydFRpbWUpL2UuZHVyYXRpb24sMSksbD1lLmVhc2UodCksbz1lLnN0YXJ0WCsoZS54LWUuc3RhcnRYKSpsLG49ZS5zdGFydFkrKGUueS1lLnN0YXJ0WSkqbDtlLm1ldGhvZChvLG4sdCxsKSxvIT09ZS54fHxuIT09ZS55P3JlcXVlc3RBbmltYXRpb25GcmFtZSgoKT0+YihlKSk6ZS5jYigpfWZ1bmN0aW9uIHkoZSx0LGwpe2xldCBvPWFyZ3VtZW50cy5sZW5ndGg+MyYmdm9pZCAwIT09YXJndW1lbnRzWzNdP2FyZ3VtZW50c1szXTo2MDAsbj1hcmd1bWVudHMubGVuZ3RoPjQmJnZvaWQgMCE9PWFyZ3VtZW50c1s0XT9hcmd1bWVudHNbNF06ZT0+MSstLWUqZSplKmUqZSxyPWFyZ3VtZW50cy5sZW5ndGg+NT9hcmd1bWVudHNbNV06dm9pZCAwLGk9YXJndW1lbnRzLmxlbmd0aD42P2FyZ3VtZW50c1s2XTp2b2lkIDAscz1lLnNjcm9sbExlZnQsYT1lLnNjcm9sbFRvcDtiKHtzY3JvbGxhYmxlOmUsbWV0aG9kKHQsbCxvLG4pe2xldCByPU1hdGguY2VpbCh0KSxzPU1hdGguY2VpbChsKTtlLnNjcm9sbExlZnQ9cixlLnNjcm9sbFRvcD1zLGk/Lih7dGFyZ2V0OmUsZWxhcHNlZDpvLHZhbHVlOm4sbGVmdDpyLHRvcDpzfSl9LHN0YXJ0VGltZTpwKCksc3RhcnRYOnMsc3RhcnRZOmEseDp0LHk6bCxkdXJhdGlvbjpvLGVhc2U6bixjYjpyfSl9dmFyIE09ZT0+ZSYmIWUuYmVoYXZpb3J8fCJzbW9vdGgiPT09ZS5iZWhhdmlvcixrPWZ1bmN0aW9uKGUsdCl7bGV0IGw9dHx8e307cmV0dXJuIE0obCk/YyhlLHtibG9jazpsLmJsb2NrLGlubGluZTpsLmlubGluZSxzY3JvbGxNb2RlOmwuc2Nyb2xsTW9kZSxib3VuZGFyeTpsLmJvdW5kYXJ5LHNraXBPdmVyZmxvd0hpZGRlbkVsZW1lbnRzOmwuc2tpcE92ZXJmbG93SGlkZGVuRWxlbWVudHMsYmVoYXZpb3I6ZT0+UHJvbWlzZS5hbGwoZS5yZWR1Y2UoKGUsdCk9PntsZXR7ZWw6byxsZWZ0Om4sdG9wOnJ9PXQsaT1vLnNjcm9sbExlZnQscz1vLnNjcm9sbFRvcDtyZXR1cm4gaT09PW4mJnM9PT1yP2U6Wy4uLmUsbmV3IFByb21pc2UoZT0+eShvLG4scixsLmR1cmF0aW9uLGwuZWFzZSwoKT0+ZSh7ZWw6byxsZWZ0OltpLG5dLHRvcDpbcyxyXX0pLGwub25TY3JvbGxDaGFuZ2UpKV19LFtdKSl9KTpQcm9taXNlLnJlc29sdmUoYyhlLHQpKX07ZXhwb3J0e2sgYXMgZGVmYXVsdH07")

const indent = true
const aside = true
const title_text = "Table of Contents"
const include_definitions = false


const tocNode = html`<nav class="plutoui-toc">
	<header>
	 <span class="toc-toggle open-toc"></span>
	 <span class="toc-toggle closed-toc"></span>
	 ${title_text}
	</header>
	<section></section>
</nav>`

tocNode.classList.toggle("aside", aside)
tocNode.classList.toggle("indent", indent)


const getParentCell = el => el.closest("pluto-cell")

const getHeaders = () => {
	const depth = Math.max(1, Math.min(6, 3)) // should be in range 1:6
	const range = Array.from({length: depth}, (x, i) => i+1) // [1, ..., depth]
	
	const selector = [
		...(include_definitions ? [
			`pluto-notebook pluto-cell .pluto-docs-binding`, 
			`pluto-notebook pluto-cell assignee:not(:empty)`, 
		] : []),
		...range.map(i => `pluto-notebook pluto-cell h${i}`)
	].join(",")
	return Array.from(document.querySelectorAll(selector)).filter(el => 
		// exclude headers inside of a pluto-docs-binding block
		!(el.nodeName.startsWith("H") && el.closest(".pluto-docs-binding")) && !el.classList.contains("no-toc")
	)
}


const document_click_handler = (event) => {
	const path = (event.path || event.composedPath())
	const toc = path.find(elem => elem?.classList?.contains?.("toc-toggle"))
	if (toc) {
		event.stopImmediatePropagation()
		toc.closest(".plutoui-toc").classList.toggle("hide")
	}
}

document.addEventListener("click", document_click_handler)


const header_to_index_entry_map = new Map()
const currently_highlighted_set = new Set()

const last_toc_element_click_time = { current: 0 }

const intersection_callback = (ixs) => {
	let on_top = ixs.filter(ix => ix.intersectionRatio > 0 && ix.intersectionRect.y < ix.rootBounds.height / 2)
	if(on_top.length > 0){
		currently_highlighted_set.forEach(a => a.classList.remove("in-view"))
		currently_highlighted_set.clear()
		on_top.slice(0,1).forEach(i => {
			let div = header_to_index_entry_map.get(i.target)
			div.classList.add("in-view")
			currently_highlighted_set.add(div)
			
			/// scroll into view
			/*
			const toc_height = tocNode.offsetHeight
			const div_pos = div.offsetTop
			const div_height = div.offsetHeight
			const current_scroll = tocNode.scrollTop
			const header_height = tocNode.querySelector("header").offsetHeight
			
			const scroll_to_top = div_pos - header_height
			const scroll_to_bottom = div_pos + div_height - toc_height
			
			// if we set a scrollTop, then the browser will stop any currently ongoing smoothscroll animation. So let's only do this if you are not currently in a smoothscroll.
			if(Date.now() - last_toc_element_click_time.current >= 2000)
				if(current_scroll < scroll_to_bottom){
					tocNode.scrollTop = scroll_to_bottom
				} else if(current_scroll > scroll_to_top){
					tocNode.scrollTop = scroll_to_top
				}
			*/
		})
	}
}
let intersection_observer_1 = new IntersectionObserver(intersection_callback, {
	root: null, // i.e. the viewport
  	threshold: 1,
	rootMargin: "-15px", // slightly smaller than the viewport
	// delay: 100,
})
let intersection_observer_2 = new IntersectionObserver(intersection_callback, {
	root: null, // i.e. the viewport
  	threshold: 1,
	rootMargin: "15px", // slightly larger than the viewport
	// delay: 100,
})

const render = (elements) => {
	header_to_index_entry_map.clear()
	currently_highlighted_set.clear()
	intersection_observer_1.disconnect()
	intersection_observer_2.disconnect()

		let last_level = `H1`
	return html`${elements.map(h => {
	const parent_cell = getParentCell(h)

		let [className, title_el] = h.matches(`.pluto-docs-binding`) ? ["pluto-docs-binding-el", h.firstElementChild] : [h.nodeName, h]

	const id = title_el.matches("assignee") ?
		title_el.innerText.replace(/^const /, "") :
		title_el.id ?
		title_el.id :
		parent_cell.id
	
	const inner_html = title_el.innerHTML
		
	const a = html`<a 
		class="${className}" 
		title="${title_el.innerText}"
		href="#${id}"
	>${inner_html}</a>`
	/* a.onmouseover=()=>{
		parent_cell.firstElementChild.classList.add(
			'highlight-pluto-cell-shoulder'
		)
	}
	a.onmouseout=() => {
		parent_cell.firstElementChild.classList.remove(
			'highlight-pluto-cell-shoulder'
		)
	} */
		
		
	a.onclick=(e) => {
		e.preventDefault();
		history.replaceState(null, null, a.href)
		last_toc_element_click_time.current = Date.now()
		scrollIntoView(h, {
			behavior: 'smooth', 
			block: 'start',
		}).then(() => 
			// sometimes it doesn't scroll to the right place
			// solution: try a second time!
			scrollIntoView(h, {
				behavior: 'smooth', 
				block: 'start',
			})
	   )
	}
	   
	// Remove any `id` attributes recursively, because they may interfere with linking-to-id using `#`
	const removeIdAttributes = (el) => {
		if (el && el.nodeType === 1) { // Element node
			if (el.hasAttribute?.("id")) el.removeAttribute?.("id")
			el.childNodes.forEach(removeIdAttributes)
		}
	}
	removeIdAttributes(a)

	// Remove Click-To-Copy-Header-ID feature
	a.querySelectorAll("pluto-header-id-copy-wrapper").forEach(el => el.remove())

	const row =  html`<div class="toc-row ${className} after-${last_level}">${a}</div>`
		intersection_observer_1.observe(title_el)
		intersection_observer_2.observe(title_el)
		header_to_index_entry_map.set(title_el, row)

	if(className.startsWith("H"))
		last_level = className
		
	return row
})}`
}

const invalidated = { current: false }

const updateCallback = () => {
	if (!invalidated.current) {
		tocNode.querySelector("section").replaceWith(
			html`<section>${render(getHeaders())}</section>`
		)
	}
}
updateCallback()
setTimeout(updateCallback, 100)
setTimeout(updateCallback, 1000)
setTimeout(updateCallback, 5000)

const notebook = document.querySelector("pluto-notebook")


// We have a mutationobserver for each cell:
const mut_observers = {
	current: [],
}

const createCellObservers = () => {
	mut_observers.current.forEach((o) => o.disconnect())
	mut_observers.current = Array.from(notebook.querySelectorAll("pluto-cell")).map(el => {
		const o = new MutationObserver(updateCallback)
		o.observe(el, {attributeFilter: ["class"]})
		return o
	})
}
createCellObservers()

// And one for the notebook's child list, which updates our cell observers:
const notebookObserver = new MutationObserver(() => {
	updateCallback()
	createCellObservers()
})
notebookObserver.observe(notebook, {childList: true})

// And finally, an observer for the document.body classList, to make sure that the toc also works when it is loaded during notebook initialization
const bodyClassObserver = new MutationObserver(updateCallback)
bodyClassObserver.observe(document.body, {attributeFilter: ["class"]})

// Hide/show the ToC when the screen gets small
let match_listener = () => {
	const small = (tocNode.closest("pluto-editor") ?? document.body).scrollWidth < 1000
	tocNode.classList.toggle("smallscreen", small)
	tocNode.classList.toggle("hide", small)
}
for(let s of [1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000]) {
	let m = matchMedia(`(max-width: ${s}px)`)
	m.addListener(match_listener)
	invalidation.then(() => m.removeListener(match_listener))
}
match_listener()

invalidation.then(() => {
	invalidated.current = true
	intersection_observer_1.disconnect()
	intersection_observer_2.disconnect()
	notebookObserver.disconnect()
	bodyClassObserver.disconnect()
	mut_observers.current.forEach((o) => o.disconnect())
	document.removeEventListener("click", document_click_handler)
})

return tocNode
</script>
<style>
@media not print {

.plutoui-toc {
	font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen-Sans, Cantarell, "Apple Color Emoji",
		"Segoe UI Emoji", "Segoe UI Symbol", system-ui, sans-serif;
	--main-bg-color: #fafafa;
	--pluto-output-color: hsl(0, 0%, 36%);
	--pluto-output-h-color: hsl(0, 0%, 21%);
	--sidebar-li-active-bg: rgb(235, 235, 235);
	--icon-filter: unset;
}

@media (prefers-color-scheme: dark) {
	.plutoui-toc {
		--main-bg-color: #303030;
		--pluto-output-color: hsl(0, 0%, 90%);
		--pluto-output-h-color: hsl(0, 0%, 97%);
		--sidebar-li-active-bg: rgb(82, 82, 82);
		--icon-filter: invert(1);
	}
}

.plutoui-toc.aside {
	color: var(--pluto-output-color);
	position: fixed;
	right: 1rem;
	top: 5rem;
	width: min(80vw, 300px);
	padding: 0.5rem;
	padding-top: 0em;
	/* border: 3px solid rgba(0, 0, 0, 0.15); */
	border-radius: 10px;
	max-height: calc(100vh - 5rem - 90px);
	overflow: auto;
	z-index: 40;
	background-color: var(--main-bg-color);
	transition: transform 300ms cubic-bezier(0.18, 0.89, 0.45, 1.12);
}

.plutoui-toc.smallscreen:not(.hide) {
	box-shadow: 0 0 11px 0px #00000010;
}

.plutoui-toc.aside.hide {
	transform: translateX(calc(100% - 28px));
	color: transparent;
}
.plutoui-toc.aside.hide section {
	display: none;
}
.plutoui-toc.aside.hide header {
	margin-bottom: 0em;
	padding-bottom: 0em;
	border-bottom: none;
}
}  /* End of Media print query */
.plutoui-toc.aside.hide .open-toc,
.plutoui-toc.aside:not(.hide) .closed-toc,
.plutoui-toc:not(.aside) .closed-toc {
	display: none;
}

@media (prefers-reduced-motion) {
  .plutoui-toc.aside {
	transition-duration: 0s;
  }
}

.toc-toggle {
	cursor: pointer;
    padding: 1em;
    margin: -1em;
    margin-right: -0.7em;
    line-height: 1em;
    display: flex;
}

.toc-toggle::before {
    content: "";
    display: inline-block;
    height: 1em;
    width: 1em;
    background-image: url("https://cdn.jsdelivr.net/gh/ionic-team/ionicons@5.5.1/src/svg/list-outline.svg");
	/* generated using https://dopiaza.org/tools/datauri/index.php */
    background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI1MTIiIGhlaWdodD0iNTEyIiB2aWV3Qm94PSIwIDAgNTEyIDUxMiI+PHRpdGxlPmlvbmljb25zLXY1LW88L3RpdGxlPjxsaW5lIHgxPSIxNjAiIHkxPSIxNDQiIHgyPSI0NDgiIHkyPSIxNDQiIHN0eWxlPSJmaWxsOm5vbmU7c3Ryb2tlOiMwMDA7c3Ryb2tlLWxpbmVjYXA6cm91bmQ7c3Ryb2tlLWxpbmVqb2luOnJvdW5kO3N0cm9rZS13aWR0aDozMnB4Ii8+PGxpbmUgeDE9IjE2MCIgeTE9IjI1NiIgeDI9IjQ0OCIgeTI9IjI1NiIgc3R5bGU9ImZpbGw6bm9uZTtzdHJva2U6IzAwMDtzdHJva2UtbGluZWNhcDpyb3VuZDtzdHJva2UtbGluZWpvaW46cm91bmQ7c3Ryb2tlLXdpZHRoOjMycHgiLz48bGluZSB4MT0iMTYwIiB5MT0iMzY4IiB4Mj0iNDQ4IiB5Mj0iMzY4IiBzdHlsZT0iZmlsbDpub25lO3N0cm9rZTojMDAwO3N0cm9rZS1saW5lY2FwOnJvdW5kO3N0cm9rZS1saW5lam9pbjpyb3VuZDtzdHJva2Utd2lkdGg6MzJweCIvPjxjaXJjbGUgY3g9IjgwIiBjeT0iMTQ0IiByPSIxNiIgc3R5bGU9ImZpbGw6bm9uZTtzdHJva2U6IzAwMDtzdHJva2UtbGluZWNhcDpyb3VuZDtzdHJva2UtbGluZWpvaW46cm91bmQ7c3Ryb2tlLXdpZHRoOjMycHgiLz48Y2lyY2xlIGN4PSI4MCIgY3k9IjI1NiIgcj0iMTYiIHN0eWxlPSJmaWxsOm5vbmU7c3Ryb2tlOiMwMDA7c3Ryb2tlLWxpbmVjYXA6cm91bmQ7c3Ryb2tlLWxpbmVqb2luOnJvdW5kO3N0cm9rZS13aWR0aDozMnB4Ii8+PGNpcmNsZSBjeD0iODAiIGN5PSIzNjgiIHI9IjE2IiBzdHlsZT0iZmlsbDpub25lO3N0cm9rZTojMDAwO3N0cm9rZS1saW5lY2FwOnJvdW5kO3N0cm9rZS1saW5lam9pbjpyb3VuZDtzdHJva2Utd2lkdGg6MzJweCIvPjwvc3ZnPg==");
    background-size: 1em;
	filter: var(--icon-filter);
}

.aside .toc-toggle.open-toc:hover::before {
    background-image: url("https://cdn.jsdelivr.net/gh/ionic-team/ionicons@5.5.1/src/svg/arrow-forward-outline.svg");
	/* generated using https://dopiaza.org/tools/datauri/index.php */
    background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI1MTIiIGhlaWdodD0iNTEyIiB2aWV3Qm94PSIwIDAgNTEyIDUxMiI+PHRpdGxlPmlvbmljb25zLXY1LWE8L3RpdGxlPjxwb2x5bGluZSBwb2ludHM9IjI2OCAxMTIgNDEyIDI1NiAyNjggNDAwIiBzdHlsZT0iZmlsbDpub25lO3N0cm9rZTojMDAwO3N0cm9rZS1saW5lY2FwOnJvdW5kO3N0cm9rZS1saW5lam9pbjpyb3VuZDtzdHJva2Utd2lkdGg6NDhweCIvPjxsaW5lIHgxPSIzOTIiIHkxPSIyNTYiIHgyPSIxMDAiIHkyPSIyNTYiIHN0eWxlPSJmaWxsOm5vbmU7c3Ryb2tlOiMwMDA7c3Ryb2tlLWxpbmVjYXA6cm91bmQ7c3Ryb2tlLWxpbmVqb2luOnJvdW5kO3N0cm9rZS13aWR0aDo0OHB4Ii8+PC9zdmc+");
}
.aside .toc-toggle.closed-toc:hover::before {
    background-image: url("https://cdn.jsdelivr.net/gh/ionic-team/ionicons@5.5.1/src/svg/arrow-back-outline.svg");
	/* generated using https://dopiaza.org/tools/datauri/index.php */
    background-image: url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI1MTIiIGhlaWdodD0iNTEyIiB2aWV3Qm94PSIwIDAgNTEyIDUxMiI+PHRpdGxlPmlvbmljb25zLXY1LWE8L3RpdGxlPjxwb2x5bGluZSBwb2ludHM9IjI0NCA0MDAgMTAwIDI1NiAyNDQgMTEyIiBzdHlsZT0iZmlsbDpub25lO3N0cm9rZTojMDAwO3N0cm9rZS1saW5lY2FwOnJvdW5kO3N0cm9rZS1saW5lam9pbjpyb3VuZDtzdHJva2Utd2lkdGg6NDhweCIvPjxsaW5lIHgxPSIxMjAiIHkxPSIyNTYiIHgyPSI0MTIiIHkyPSIyNTYiIHN0eWxlPSJmaWxsOm5vbmU7c3Ryb2tlOiMwMDA7c3Ryb2tlLWxpbmVjYXA6cm91bmQ7c3Ryb2tlLWxpbmVqb2luOnJvdW5kO3N0cm9rZS13aWR0aDo0OHB4Ii8+PC9zdmc+");
}



.plutoui-toc header {
	display: flex;
	align-items: center;
	gap: .3em;
	font-size: 1.5em;
	/* margin-top: -0.1em; */
	margin-bottom: 0.4em;
	padding: 0.5rem;
	margin-left: 0;
	margin-right: 0;
	font-weight: bold;
	/* border-bottom: 2px solid rgba(0, 0, 0, 0.15); */
	position: sticky;
	top: 0px;
	background: var(--main-bg-color);
	z-index: 41;
}
.plutoui-toc.aside header {
	padding-left: 0;
	padding-right: 0;
}

.plutoui-toc section .toc-row {
	white-space: nowrap;
	overflow: hidden;
	text-overflow: ellipsis;
	padding: .1em;
	border-radius: .2em;
}

.plutoui-toc section .toc-row.H1 {
	margin-top: 1em;
}


.plutoui-toc.aside section .toc-row.in-view {
	background: var(--sidebar-li-active-bg);
}


	
.highlight-pluto-cell-shoulder {
	background: rgba(0, 0, 0, 0.05);
	background-clip: padding-box;
}

.plutoui-toc section a {
	text-decoration: none;
	font-weight: normal;
	color: var(--pluto-output-color);
}
.plutoui-toc section a:hover {
	color: var(--pluto-output-h-color);
}

.plutoui-toc.indent section a.H1 {
	font-weight: 700;
	line-height: 1em;
}

.plutoui-toc.indent section .after-H2 a { padding-left: 10px; }
.plutoui-toc.indent section .after-H3 a { padding-left: 20px; }
.plutoui-toc.indent section .after-H4 a { padding-left: 30px; }
.plutoui-toc.indent section .after-H5 a { padding-left: 40px; }
.plutoui-toc.indent section .after-H6 a { padding-left: 50px; }

.plutoui-toc.indent section a.H1 { padding-left: 0px; }
.plutoui-toc.indent section a.H2 { padding-left: 10px; }
.plutoui-toc.indent section a.H3 { padding-left: 20px; }
.plutoui-toc.indent section a.H4 { padding-left: 30px; }
.plutoui-toc.indent section a.H5 { padding-left: 40px; }
.plutoui-toc.indent section a.H6 { padding-left: 50px; }


.plutoui-toc.indent section a.pluto-docs-binding-el,
.plutoui-toc.indent section a.ASSIGNEE
	{
	font-family: JuliaMono, monospace;
	font-size: .8em;
	/* background: black; */
	font-weight: 700;
    font-style: italic;
	color: var(--cm-var-color); /* this is stealing a variable from Pluto, but it's fine if that doesnt work */
}
.plutoui-toc.indent section a.pluto-docs-binding-el::before,
.plutoui-toc.indent section a.ASSIGNEE::before
	{
	content: "> ";
	opacity: .3;
}
</style>
mimetext/htmlrootassigneelast_run_timestampA}*×xpersist_js_state·has_pluto_hook_features§cell_id$1c6d16be-e8e8-45f1-aa32-c3fb08af19cedepends_on_disabled_cells§runtime5mgpublished_object_keysdepends_on_skipped_cells§errored$d8449f1a-d294-11ef-3cfa-4fc33a5daa00queued¤logsrunning¦outputbody<div class="markdown"><h2 id="Maximum-Likelihood-Estimation-for-the-Multinomial">Maximum Likelihood Estimation for the Multinomial</h2>
<h4 id="Maximum-likelihood-as-a-special-case-of-Bayesian-estimation">Maximum likelihood as a special case of Bayesian estimation</h4>
<p>We can obtain the maximum likelihood estimate for <span class="tex">$\mu_k$</span> based on <span class="tex">$N$</span> throws of a <span class="tex">$K$</span>-sided die within the Bayesian framework by letting the prior for <span class="tex">$\mu$</span> approach a uniform distribution. For a Dirichlet prior <span class="tex">$\mathrm&#123;Dir&#125;&#40;\mu | \alpha&#41;$</span>, this corresponds to setting <span class="tex">$\alpha \rightarrow &#40;1, 1, \dots, 1&#41;$</span>.</p>
<p>Prove for yourself that </p>
<p class="tex">$$\begin&#123;align*&#125;
\hat&#123;\mu&#125;_k &amp;&#61; \arg\max_&#123;\mu_k&#125; p&#40;D|\mu&#41; &#61; \frac&#123;m_k&#125;&#123;N&#125;\,.
\end&#123;align*&#125;$$</p>
</div>mimetext/htmlrootassigneelast_run_timestampA}*|persist_js_state·has_pluto_hook_features§cell_id$d8449f1a-d294-11ef-3cfa-4fc33a5daa00depends_on_disabled_cells§runtime 	k,published_object_keysdepends_on_skipped_cells§errored$d8422bf2-d294-11ef-0144-098f414c6454queued¤logsrunning¦outputbody<secret-h1 style="
font-size: 3rem; 
border-bottom: none; 
text-shadow: -3px 3px #a2d4ff5c;
font-family: 'Vollkorn', Palatino, Georgia, serif;
color: var(--pluto-output-h-color);
font-weight: 700;
">Discrete Data and the Multinomial Distribution</secret-sh1>mimetext/htmlrootassigneelast_run_timestampA}*9ְpersist_js_state·has_pluto_hook_features§cell_id$d8422bf2-d294-11ef-0144-098f414c6454depends_on_disabled_cells§runtimeͱϵpublished_object_keysdepends_on_skipped_cells§errored±cell_dependencies #$d842ad86-d294-11ef-3266-253f80ecf4b7precedence_heuristic	cell_id$d842ad86-d294-11ef-3266-253f80ecf4b7downstream_cells_mapupstream_cells_map@md_strgetindex$d843b33c-d294-11ef-195d-2708fbfba49dprecedence_heuristic	cell_id$d843b33c-d294-11ef-195d-2708fbfba49ddownstream_cells_mapupstream_cells_map@md_strgetindex$d843efdc-d294-11ef-0f3a-630ecdd0aceeprecedence_heuristic	cell_id$d843efdc-d294-11ef-0f3a-630ecdd0aceedownstream_cells_mapupstream_cells_map@md_strgetindex$acdc5bfa-7188-4a37-80e6-5026ecd1a813precedence_heuristic	cell_id$acdc5bfa-7188-4a37-80e6-5026ecd1a813downstream_cells_mapupstream_cells_mapkeyconceptsummary$d843540a-d294-11ef-3846-2bf27b7e9b30precedence_heuristic	cell_id$d843540a-d294-11ef-3846-2bf27b7e9b30downstream_cells_mapupstream_cells_map@md_strgetindex$93b8ac65-ac41-4a03-bddd-5f01ccb5b42dprecedence_heuristic	cell_id$93b8ac65-ac41-4a03-bddd-5f01ccb5b42ddownstream_cells_mapupstream_cells_map@md_strgetindex$f9977fc0-0d3f-467e-822d-72f3a338f717precedence_heuristic	cell_id$f9977fc0-0d3f-467e-822d-72f3a338f717downstream_cells_mapupstream_cells_mapkeyconcept$d843d0c4-d294-11ef-10b6-cb982615d58aprecedence_heuristic	cell_id$d843d0c4-d294-11ef-10b6-cb982615d58adownstream_cells_mapupstream_cells_map@md_strHTMLgetindex$d843a338-d294-11ef-2748-b95f2af1396bprecedence_heuristic	cell_id$d843a338-d294-11ef-2748-b95f2af1396bdownstream_cells_mapupstream_cells_map@md_strgetindex$d844fa76-d294-11ef-172a-85e68842c252precedence_heuristic	cell_id$d844fa76-d294-11ef-172a-85e68842c252downstream_cells_mapupstream_cells_map@md_strgetindex$d84422a6-d294-11ef-148b-c762a90cd620precedence_heuristic	cell_id$d84422a6-d294-11ef-148b-c762a90cd620downstream_cells_mapupstream_cells_map@md_strgetindex$d844bcfa-d294-11ef-0874-b154f3ed810bprecedence_heuristic	cell_id$d844bcfa-d294-11ef-0874-b154f3ed810bdownstream_cells_mapupstream_cells_map@md_strHTMLgetindex$81b5aea0-8101-46e2-a875-1058029ebf99precedence_heuristic	cell_id$81b5aea0-8101-46e2-a875-1058029ebf99downstream_cells_mapupstream_cells_map@md_strhide_solutiongetindex$d8439866-d294-11ef-230b-dfde21aedfbfprecedence_heuristic	cell_id$d8439866-d294-11ef-230b-dfde21aedfbfdownstream_cells_mapupstream_cells_map@md_strgetindex$d842fe4c-d294-11ef-15a9-a9a6e359f47dprecedence_heuristic	cell_id$d842fe4c-d294-11ef-15a9-a9a6e359f47ddownstream_cells_mapupstream_cells_map@md_strgetindex$204bec3f-6fde-48c1-b2b6-9f88d484c130precedence_heuristic	cell_id$204bec3f-6fde-48c1-b2b6-9f88d484c130downstream_cells_mapupstream_cells_mapexercises$d8443e38-d294-11ef-25db-b16df87850f4precedence_heuristic	cell_id$d8443e38-d294-11ef-25db-b16df87850f4downstream_cells_mapupstream_cells_map@md_strgetindex$72f24b54-ab22-4a54-9ece-7433048f4769precedence_heuristic	cell_id$72f24b54-ab22-4a54-9ece-7433048f4769downstream_cells_mapupstream_cells_map@md_strgetindex$448d0679-b47a-4db9-ad7d-a45786350fefprecedence_heuristic	cell_id$448d0679-b47a-4db9-ad7d-a45786350fefdownstream_cells_mapupstream_cells_map@md_strhide_solutiongetindex$01c4c590-fece-49a5-8979-6e0d54f7850aprecedence_heuristic	cell_id$01c4c590-fece-49a5-8979-6e0d54f7850adownstream_cells_mapupstream_cells_map@md_strhide_solutiongetindex$d8424e52-d294-11ef-0083-fbb77df4d853precedence_heuristic	cell_id$d8424e52-d294-11ef-0083-fbb77df4d853downstream_cells_mapupstream_cells_map@md_strgetindex$d844d564-d294-11ef-0454-416352d43524precedence_heuristic	cell_id$d844d564-d294-11ef-0454-416352d43524downstream_cells_mapupstream_cells_map@md_strgetindex$d84369a4-d294-11ef-38f7-7f393869b705precedence_heuristic	cell_id$d84369a4-d294-11ef-38f7-7f393869b705downstream_cells_mapupstream_cells_map@md_strgetindex$3c2ee96d-18a6-45d0-a2cf-f2ebbf5e22f0precedence_heuristic	cell_id$3c2ee96d-18a6-45d0-a2cf-f2ebbf5e22f0downstream_cells_mapupstream_cells_map@md_strhide_solutiongetindex$59fb1e66-cf05-4f2b-8027-7ff3b1a57c15precedence_heuristic	cell_id$59fb1e66-cf05-4f2b-8027-7ff3b1a57c15downstream_cells_mapupstream_cells_map@md_strgetindex$d3a4a1dc-3fdf-479d-a51c-a1e23073c556precedence_heuristiccell_id$d3a4a1dc-3fdf-479d-a51c-a1e23073c556downstream_cells_mapBmlipTeachingToolsupstream_cells_map$62b42d1d-be91-4740-bac6-b4527494959dprecedence_heuristic	cell_id$62b42d1d-be91-4740-bac6-b4527494959ddownstream_cells_mapupstream_cells_map@md_strgetindex$d843c228-d294-11ef-0d34-3520dc97859cprecedence_heuristic	cell_id$d843c228-d294-11ef-0d34-3520dc97859cdownstream_cells_mapupstream_cells_map@md_strgetindex$63cc56b7-588a-43c3-8327-ad6367608601precedence_heuristic	cell_id$63cc56b7-588a-43c3-8327-ad6367608601downstream_cells_mapupstream_cells_map@md_strgetindex$d843defc-d294-11ef-358b-f56f514dcf93precedence_heuristic	cell_id$d843defc-d294-11ef-358b-f56f514dcf93downstream_cells_mapupstream_cells_map@md_strgetindex$d842d368-d294-11ef-024d-45e58ca994e0precedence_heuristic	cell_id$d842d368-d294-11ef-024d-45e58ca994e0downstream_cells_mapupstream_cells_map@md_strgetindex$4482e857-af6b-4459-a0a2-cd7ad57ed94fprecedence_heuristic	cell_id$4482e857-af6b-4459-a0a2-cd7ad57ed94fdownstream_cells_mapupstream_cells_map@md_strhide_proofgetindex$1c6d16be-e8e8-45f1-aa32-c3fb08af19ceprecedence_heuristic	cell_id$1c6d16be-e8e8-45f1-aa32-c3fb08af19cedownstream_cells_mapupstream_cells_mapPlutoUIPlutoUI.TableOfContents$d8449f1a-d294-11ef-3cfa-4fc33a5daa00precedence_heuristic	cell_id$d8449f1a-d294-11ef-3cfa-4fc33a5daa00downstream_cells_mapupstream_cells_map@md_strgetindex$d8422bf2-d294-11ef-0144-098f414c6454precedence_heuristic	cell_id$d8422bf2-d294-11ef-0144-098f414c6454downstream_cells_mapupstream_cells_maptitlecell_execution_order #$d3a4a1dc-3fdf-479d-a51c-a1e23073c556$d8422bf2-d294-11ef-0144-098f414c6454$1c6d16be-e8e8-45f1-aa32-c3fb08af19ce$d8424e52-d294-11ef-0083-fbb77df4d853$d842ad86-d294-11ef-3266-253f80ecf4b7$d842d368-d294-11ef-024d-45e58ca994e0$f9977fc0-0d3f-467e-822d-72f3a338f717$d842fe4c-d294-11ef-15a9-a9a6e359f47d$d843540a-d294-11ef-3846-2bf27b7e9b30$d84369a4-d294-11ef-38f7-7f393869b705$d8439866-d294-11ef-230b-dfde21aedfbf$d843a338-d294-11ef-2748-b95f2af1396b$d843b33c-d294-11ef-195d-2708fbfba49d$d843c228-d294-11ef-0d34-3520dc97859c$d843d0c4-d294-11ef-10b6-cb982615d58a$d843defc-d294-11ef-358b-f56f514dcf93$d843efdc-d294-11ef-0f3a-630ecdd0acee$d84422a6-d294-11ef-148b-c762a90cd620$d8449f1a-d294-11ef-3cfa-4fc33a5daa00$4482e857-af6b-4459-a0a2-cd7ad57ed94f$d844bcfa-d294-11ef-0874-b154f3ed810b$d844d564-d294-11ef-0454-416352d43524$d844fa76-d294-11ef-172a-85e68842c252$63cc56b7-588a-43c3-8327-ad6367608601$acdc5bfa-7188-4a37-80e6-5026ecd1a813$204bec3f-6fde-48c1-b2b6-9f88d484c130$62b42d1d-be91-4740-bac6-b4527494959d$01c4c590-fece-49a5-8979-6e0d54f7850a$d8443e38-d294-11ef-25db-b16df87850f4$448d0679-b47a-4db9-ad7d-a45786350fef$72f24b54-ab22-4a54-9ece-7433048f4769$3c2ee96d-18a6-45d0-a2cf-f2ebbf5e22f0$93b8ac65-ac41-4a03-bddd-5f01ccb5b42d$81b5aea0-8101-46e2-a875-1058029ebf99$59fb1e66-cf05-4f2b-8027-7ff3b1a57c15last_hot_reload_time        shortpathThe Multinomial Distribution.jlprocess_statusreadypathH/home/runner/work/course/course/lectures/The Multinomial Distribution.jlpluto_versionv0.20.19last_save_timeA}*˪cell_order #$d8422bf2-d294-11ef-0144-098f414c6454$1c6d16be-e8e8-45f1-aa32-c3fb08af19ce$d8424e52-d294-11ef-0083-fbb77df4d853$d842ad86-d294-11ef-3266-253f80ecf4b7$d842d368-d294-11ef-024d-45e58ca994e0$f9977fc0-0d3f-467e-822d-72f3a338f717$d842fe4c-d294-11ef-15a9-a9a6e359f47d$d843540a-d294-11ef-3846-2bf27b7e9b30$d84369a4-d294-11ef-38f7-7f393869b705$d8439866-d294-11ef-230b-dfde21aedfbf$d843a338-d294-11ef-2748-b95f2af1396b$d843b33c-d294-11ef-195d-2708fbfba49d$d843c228-d294-11ef-0d34-3520dc97859c$d843d0c4-d294-11ef-10b6-cb982615d58a$d843defc-d294-11ef-358b-f56f514dcf93$d843efdc-d294-11ef-0f3a-630ecdd0acee$d84422a6-d294-11ef-148b-c762a90cd620$d8449f1a-d294-11ef-3cfa-4fc33a5daa00$4482e857-af6b-4459-a0a2-cd7ad57ed94f$d844bcfa-d294-11ef-0874-b154f3ed810b$d844d564-d294-11ef-0454-416352d43524$d844fa76-d294-11ef-172a-85e68842c252$63cc56b7-588a-43c3-8327-ad6367608601$acdc5bfa-7188-4a37-80e6-5026ecd1a813$204bec3f-6fde-48c1-b2b6-9f88d484c130$62b42d1d-be91-4740-bac6-b4527494959d$01c4c590-fece-49a5-8979-6e0d54f7850a$d8443e38-d294-11ef-25db-b16df87850f4$448d0679-b47a-4db9-ad7d-a45786350fef$72f24b54-ab22-4a54-9ece-7433048f4769$3c2ee96d-18a6-45d0-a2cf-f2ebbf5e22f0$93b8ac65-ac41-4a03-bddd-5f01ccb5b42d$81b5aea0-8101-46e2-a875-1058029ebf99$59fb1e66-cf05-4f2b-8027-7ff3b1a57c15$d3a4a1dc-3fdf-479d-a51c-a1e23073c556published_objectsnbpkginstall_time_ns   3<-instantiatedòinstalled_versionsBmlipTeachingTools1.3.1terminal_outputsnbpkg_syncT
[0m[1mResolving...[22m
[90m===[39m
[36m[1m     Project[22m[39m No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_nemhjmqnaw/Project.toml`
[32m[1m    Updating[22m[39m `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_nemhjmqnaw/Manifest.toml`
  [90m[f43a241f] [39m[93m↑ Downloads v1.6.0 ⇒ v1.7.0[39m
  [90m[44cfe95a] [39m[93m↑ Pkg v1.12.0 ⇒ v1.12.1[39m
  [90m[deac9b47] [39m[93m↑ LibCURL_jll v8.11.1+1 ⇒ v8.15.0+0[39m
  [90m[14a3606d] [39m[93m↑ MozillaCACerts_jll v2025.5.20 ⇒ v2025.11.4[39m
  [90m[458c3c95] [39m[93m↑ OpenSSL_jll v3.5.1+0 ⇒ v3.5.4+0[39m
  [90m[3f19e933] [39m[93m↑ p7zip_jll v17.5.0+2 ⇒ v17.7.0+0[39m

[0m[1mInstantiating...[22m
[90m===[39m

[0m[1mPrecompiling...[22m
[90m===[39mBmlipTeachingToolsT
[0m[1mResolving...[22m
[90m===[39m
[36m[1m     Project[22m[39m No packages added to or removed from `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_nemhjmqnaw/Project.toml`
[32m[1m    Updating[22m[39m `~/.julia/scratchspaces/c3e4b0f8-55cb-11ea-2926-15256bba5781/pkg_envs/env_nemhjmqnaw/Manifest.toml`
  [90m[f43a241f] [39m[93m↑ Downloads v1.6.0 ⇒ v1.7.0[39m
  [90m[44cfe95a] [39m[93m↑ Pkg v1.12.0 ⇒ v1.12.1[39m
  [90m[deac9b47] [39m[93m↑ LibCURL_jll v8.11.1+1 ⇒ v8.15.0+0[39m
  [90m[14a3606d] [39m[93m↑ MozillaCACerts_jll v2025.5.20 ⇒ v2025.11.4[39m
  [90m[458c3c95] [39m[93m↑ OpenSSL_jll v3.5.1+0 ⇒ v3.5.4+0[39m
  [90m[3f19e933] [39m[93m↑ p7zip_jll v17.5.0+2 ⇒ v17.7.0+0[39m

[0m[1mInstantiating...[22m
[90m===[39m

[0m[1mPrecompiling...[22m
[90m===[39menabled÷restart_recommended_msgrestart_required_msgbusy_packageswaiting_for_permission,waiting_for_permission_but_probably_disabled«cell_inputs #$d842ad86-d294-11ef-3266-253f80ecf4b7cell_id$d842ad86-d294-11ef-3266-253f80ecf4b7code0md"""
## Discrete Data: the 1-of-K Coding Scheme

Consider a coin-tossing experiment with outcomes ``x \in\{0,1\}`` (tail and head, respectively) and let ``0\leq \mu \leq 1`` represent the probability of heads. The data generating distribution for this model can written as a [**Bernoulli distribution**](https://en.wikipedia.org/wiki/Bernoulli_distribution):

```math
 
p(x|\mu) = \mu^{x}(1-\mu)^{1-x}
```

Note that the variable ``x`` acts as a (binary) **selector** for the tail or head probabilities. Think of this as an 'if'-statement in programming.

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d843b33c-d294-11ef-195d-2708fbfba49dcell_id$d843b33c-d294-11ef-195d-2708fbfba49dcodemd"""
We recognize the ``(\alpha_k)``'s as prior pseudo-counts and the Dirichlet distribution shows to be a [conjugate prior](https://en.wikipedia.org/wiki/Conjugate_prior) to the categorical/multinomial:

```math
\begin{align*}
\underbrace{\text{Dirichlet}}_{\text{posterior}} &\propto \underbrace{\text{categorical}}_{\text{likelihood}} \cdot \underbrace{\text{Dirichlet}}_{\text{prior}}
\end{align*}
```

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d843efdc-d294-11ef-0f3a-630ecdd0aceecell_id$d843efdc-d294-11ef-0f3a-630ecdd0aceecodemd"""
A related distribution is the distribution over count observations ``D_m=\{m_1,\ldots,m_K\}``, which is called the **multinomial distribution**,

```math
p(D_m|\mu) =\frac{N!}{m_1! m_2!\ldots m_K!} \,\prod_k \mu_k^{m_k}\,.
```

"""metadatashow_logsèdisabled®skip_as_script«code_folded$acdc5bfa-7188-4a37-80e6-5026ecd1a813cell_id$acdc5bfa-7188-4a37-80e6-5026ecd1a813codekeyconceptsummary()metadatashow_logsèdisabled®skip_as_script«code_folded$d843540a-d294-11ef-3846-2bf27b7e9b30cell_id$d843540a-d294-11ef-3846-2bf27b7e9b30code md"""
# Bayesian Density Estimation for a Loaded Die

Now let's proceed with learning the parameters for a model for ``N`` independent-and-identically-distributed (IID) rolls of a ``K``-sided die, based on observed data set ``D=\{x_1,\ldots,x_N\}``. 


"""metadatashow_logsèdisabled®skip_as_script«code_folded$93b8ac65-ac41-4a03-bddd-5f01ccb5b42dcell_id$93b8ac65-ac41-4a03-bddd-5f01ccb5b42dcodemd"""

#### Evidence for the Multinomial-Dirichlet model (**) 

As above, consider the following model assumptions for $N$ tosses with a $K$-sided die with parameters $\mu = (\mu_1,\mu_2, \ldots,\mu_K)$.  

```math
\begin{align}
p(D|\mu) &= \prod_{n=1}^N \mathrm{Cat}(x_n|\mu) = \prod_{k=1}^{K} \mu_k^{m_k} \tag{likelihood}\\
p(\mu|\alpha) &= \mathrm{Dir}(\mu|\alpha) = \frac{1}{B(\alpha)} \prod_{k=1}^{K} \mu_k^{\alpha_k -1}   \tag{prior}
\end{align}
```
where $B(\alpha) = \frac{\prod_k \Gamma(\alpha_k)}{\Gamma(\sum_k \alpha_k)}$ is known as the [Beta function](https://en.wikipedia.org/wiki/Beta_function).

Work out both the model evidence and the posterior distribution for $\mu$.
"""metadatashow_logsèdisabled®skip_as_script«code_folded$f9977fc0-0d3f-467e-822d-72f3a338f717cell_id$f9977fc0-0d3f-467e-822d-72f3a338f717code٠keyconcept("", "Discrete event outcomes are typically represented via one-hot encoding, in which each outcome corresponds to a unique binary indicator vector.")metadatashow_logsèdisabled®skip_as_script«code_folded$d843d0c4-d294-11ef-10b6-cb982615d58acell_id$d843d0c4-d294-11ef-10b6-cb982615d58acode?md"""
## $(HTML("<span id='prediction-loaded-die'>Prediction of next toss for the loaded die</span>"))

Let's apply what we have learned about the loaded die to compute the probability that we throw the ``k``-th face at the next toss. 

```math
\begin{align*}
p(x_{\bullet,k}=1|D)  &= \int p(x_{\bullet,k}=1|\mu)\,p(\mu|D) \,\mathrm{d}\mu \\
  &= \int_0^1 \mu_k \times  \mathcal{Dir}(\mu|\,\alpha+m) \,\mathrm{d}\mu  \\
  &= \mathrm{E}\left[ \mu_k | D\right] \\
  &= \frac{m_k + \alpha_k }{ N+ \sum_k \alpha_k}
\end{align*}
```

(You can find the mean of the Dirichlet distribution ``\mathrm{E}\left[ \mu_k \right]`` at its [Wikipedia site](https://en.wikipedia.org/wiki/Dirichlet_distribution)). 

This result is simply a generalization of [**Laplace's rule of succession**](https://en.wikipedia.org/wiki/Rule_of_succession).

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d843a338-d294-11ef-2748-b95f2af1396bcell_id$d843a338-d294-11ef-2748-b95f2af1396bcodejmd"""
## Inference for ``\{\mu_k\}``

The posterior for  ``\{\mu_k\}`` can be obtained through Bayes rule:

```math
\begin{align*}
p(\mu|D,\alpha) &\propto p(D|\mu) \cdot p(\mu|\alpha) \\
  &\propto  \prod_k \mu_k^{m_k} \cdot \prod_k \mu_k^{\alpha_k-1} \\
  &= \prod_k \mu_k^{\alpha_k + m_k -1}\\
  &\propto \mathrm{Dir}\left(\mu\,|\,\alpha + m \right) \tag{B-2.41} \\
  &= \frac{\Gamma\left(\sum_k (\alpha_k + m_k) \right)}{\Gamma(\alpha_1+m_1) \Gamma(\alpha_2+m_2) \cdots \Gamma(\alpha_K + m_K)} \prod_{k=1}^K \mu_k^{\alpha_k + m_k -1}
\end{align*}
```

where ``m = (m_1,m_2,\ldots,m_K)^T`` is the count vector.

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d844fa76-d294-11ef-172a-85e68842c252cell_id$d844fa76-d294-11ef-172a-85e68842c252codemd"""
Setting the derivative of ``\tilde{\mathrm{L}}(\mu)`` to zero yields the **sample proportion** for ``\mu_k`` 

```math
\begin{equation*}
\nabla_{\mu_k}   \tilde{\mathrm{L}}(\mu) = \frac{m_k }
{\hat\mu_k } - \lambda  \overset{!}{=} 0 \; \Rightarrow \; \hat\mu_k = \frac{m_k }{N}
\end{equation*}
```

where we get ``\lambda`` from the constraint 

```math
\begin{equation*}
\sum_k \hat \mu_k = \sum_k \frac{m_k}
{\lambda} = \frac{N}{\lambda} \overset{!}{=}  1
\end{equation*}
```



"""metadatashow_logsèdisabled®skip_as_script«code_folded$d84422a6-d294-11ef-148b-c762a90cd620cell_id$d84422a6-d294-11ef-148b-c762a90cd620codeOmd"""
(We insert this slide only to alert you to the difference between using one-hot encoded outcomes ``D=\{x_1,x_2,\ldots,x_N\}`` as the data, versus using counts ``D_m = \{m_1,m_2,\ldots,m_K\}`` as the data. When used as a likelihood function for ``\mu``, it makes no difference whether you use ``p(D|\mu)`` or ``p(D_m|\mu)``.)

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d844bcfa-d294-11ef-0874-b154f3ed810bcell_id$d844bcfa-d294-11ef-0874-b154f3ed810bcodeBmd"""
#### $(HTML("<span id='ML-for-multinomial'>Maximum likelihood estimation by optimizing a constrained log-likelihood</span>"))

Of course, we shouldn't have to go through the full Bayesian framework to get the maximum likelihood estimate. Alternatively, we can find the maximum likelihood (ML) solution directly by optimizing the (constrained) log-likelihood.

The log-likelihood for the multinomial distribution is given by

```math
\begin{align*}
\mathrm{L}(\mu) &\triangleq \log p(D_m|\mu) \propto \log \prod_k \mu_k^{m_k} =  \sum_k m_k \log \mu_k 
\end{align*}
```

"""metadatashow_logsèdisabled®skip_as_script«code_folded$81b5aea0-8101-46e2-a875-1058029ebf99cell_id$81b5aea0-8101-46e2-a875-1058029ebf99codehide_solution(
	md"""

	```math
	\begin{align}
	\overbrace{\prod_{k=1}^{K} \mu_k^{m_k}}^{\text{likelihood }p(D|\mu)} \cdot \overbrace{\frac{1}{B(\alpha)} \prod_{k=1}^{K} \mu_k^{\alpha_k -1}}^{\text{prior }p(\mu|\alpha)}  
	&= \frac{1}{B(\alpha)} \prod_{k=1}^{K} \mu_k^{m_k + \alpha_k -1} \\
	&= \frac{B(m+\alpha)}{B(\alpha)} \frac{1}{B(m+\alpha)}\prod_{k=1}^{K} \mu_k^{m_k + \alpha_k -1} \\
	&= \underbrace{\frac{B(m+\alpha)}{B(\alpha)}}_{\text{evidence }p(D|\alpha)} \,\underbrace{\mathrm{Dir}(\mu|m+\alpha)}_{\text{posterior }p(\mu|D,\alpha)} 
	\end{align} 
	```

	This equation is the equivalent of the [Gaussian multiplication formula](https://bmlip.github.io/course/lectures/The%20Gaussian%20Distribution.html#(Multivariate)-Gaussian-Multiplication) for discrete data. Note that the evidence is a scalar normalizer for given observations $m$ and pseudo-observations ("prior" observations) $\alpha$.
	""")metadatashow_logsèdisabled®skip_as_script«code_folded$d8439866-d294-11ef-230b-dfde21aedfbfcell_id$d8439866-d294-11ef-230b-dfde21aedfbfcodemd"""

#### prior distribution

Next, we need a prior for the parameters ``\mu = (\mu_1,\mu_2,\ldots,\mu_K)^T``. 

In the [binary coin toss example](https://bmlip.github.io/course/lectures/Bayesian%20Machine%20Learning.html#beta-prior), we used a [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) that was conjugate with the binomial and forced us to choose prior pseudo-counts. 

The generalization of the beta prior to ``K`` parameters ``\{\mu_k\}`` is the [Dirichlet distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution):

```math
p(\mu|\alpha) = \mathrm{Dir}(\mu|\alpha) = \frac{\Gamma\left(\sum_k \alpha_k\right)}{\Gamma(\alpha_1)\cdots \Gamma(\alpha_K)} \prod_{k=1}^K \mu_k^{\alpha_k-1} 
```

where ``\Gamma(\cdot)`` is the [Gamma function](https://en.wikipedia.org/wiki/Gamma_function). 

  - The Gamma function can be interpreted as a generalization of the factorial function to the real (``\mathbb{R}``) numbers. If ``n`` is a natural number (``1,2,3, \ldots $), then $\Gamma(n) = (n-1)!``, where ``(n-1)! = (n-1)\cdot (n-2) \cdot 1``.

As before for the Beta distribution in the coin toss experiment, you can interpret ``\alpha_k`` as the prior number of (pseudo-)observations that the die landed on the  ``k``-th face.

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d842fe4c-d294-11ef-15a9-a9a6e359f47dcell_id$d842fe4c-d294-11ef-15a9-a9a6e359f47dcode md"""
## The Categorical Distribution

Consider a toss with a ``K``-sided die. We use a one-hot coding scheme, i.e., the outcome is encoded as 
```math
x_{k} = \begin{cases} 1 & \text{if the throw landed on $k$-th face}\\
0 & \text{otherwise} \end{cases} \,.
```

Assume the probabilities


```math 
p(x_{k}=1) = \mu_k \quad \text{with } \mu_k \geq 0 \text{ and }\sum_k \mu_k  = 1 \,.
```
The data generating distribution for one-hot encoded outcome ``x = (x_{1},x_{2},\ldots,x_{K})`` (and ``\mu = (\mu_1,\mu_2,\dots,\mu_k)^T``) is then given by 

```math
p(x|\mu) = \mu_1^{x_1} \mu_2^{x_2} \cdots \mu_K^{x_K}=\prod_{k=1}^K \mu_k^{x_k} \tag{B-2.26}
```

This generalized Bernoulli distribution is called the [**categorical distribution**](https://en.wikipedia.org/wiki/Categorical_distribution).

"""metadatashow_logsèdisabled®skip_as_script«code_folded$204bec3f-6fde-48c1-b2b6-9f88d484c130cell_id$204bec3f-6fde-48c1-b2b6-9f88d484c130codeexercises(header_level=1)metadatashow_logsèdisabled®skip_as_script«code_folded$d8443e38-d294-11ef-25db-b16df87850f4cell_id$d8443e38-d294-11ef-25db-b16df87850f4code)md"""
#### Discrete Distributions (*)

Show that

- (a) the categorial distribution is a special case of the multinomial for ``N=1``.  

- (b) the Bernoulli is a special case of the categorial distribution for ``K=2``.    

- (c) the binomial is a special case of the multinomial for ``K=2``.

"""metadatashow_logsèdisabled®skip_as_script«code_folded$72f24b54-ab22-4a54-9ece-7433048f4769cell_id$72f24b54-ab22-4a54-9ece-7433048f4769codemd"""

#### Laplace's Generalized Rule of Succession (**) 

Show that Laplace's generalized rule of succession can be worked out to a prediction that is composed of a prior prediction and data-based correction term.


"""metadatashow_logsèdisabled®skip_as_script«code_folded$448d0679-b47a-4db9-ad7d-a45786350fefcell_id$448d0679-b47a-4db9-ad7d-a45786350fefcodeJhide_solution(
md"""

- (a) The probability mass function of a **multinomial distribution** is 
```math 
	p(D_m|\mu) =\frac{N!}{m_1! m_2!\ldots m_K!} \,\prod_k \mu_k^{m_k}
```
over the data frequencies ``D_m=\{m_1,\ldots,m_K\}`` with constraints that ``\sum_k \mu_k = 1`` and ``\sum_k m_k=N``. 

Setting ``N=1``, we see that ``p(D_m|\mu) \propto \prod_k \mu_k^{m_k}`` with ``\sum_k m_k=1``, making the sample-space one-hot coded. This is the **categorical distribution**.       
		
- (b) When ``K=2``, the constraint for the categorical distribution takes the form ``m_1=1-m_2`` leading to 

```math
	p(D_m|\mu) \propto \mu_1^{m_1}(1-\mu_1)^{1-m_1}
```
which is associated with the **Bernoulli distribution**.       

- (c) Plugging ``K=2`` into the multinomial distribution leads to ``p(D_m|\mu) =\frac{N!}{m_1! m_2!}\mu_1^{m_1}\left(\mu_2^{m_2}\right)`` with the constraints ``m_1+m_2=N`` and ``\mu_1+\mu_2=1``. Then plugging the constraints back in we obtain 
```math
	p(D_m|\mu) = \frac{N!}{m_1! (N-m1)!}\mu_1^{m_1}\left(1-\mu_1\right)^{N-m_1}
```
which is the **binomial distribution**.


""")metadatashow_logsèdisabled®skip_as_script«code_folded$01c4c590-fece-49a5-8979-6e0d54f7850acell_id$01c4c590-fece-49a5-8979-6e0d54f7850acode_hide_solution(
md"""
Derivations are in the lecture notes.        
		
- (a)


```math
p(x_n|\mu) = \prod_k \mu_k^{x_{nk}} \quad \text{subject to} \quad \sum_k \mu_k = 1 \,.
```

```math
p(D|\mu)  = \sum_k m_k \log \mu_k
```

where ``m_k = \sum_n x_{nk}``.       

- (b)


```math
\hat \mu = \frac{m_k}{N}\,,
```

which is the *sample proportion*.
""")metadatashow_logsèdisabled®skip_as_script«code_folded$d8424e52-d294-11ef-0083-fbb77df4d853cell_id$d8424e52-d294-11ef-0083-fbb77df4d853codemd"""
## Preliminaries

##### Goal 

  * Simple Bayesian and maximum likelihood-based density estimation for discretely valued data sets

##### Materials        

  * Mandatory

      * These lecture notes
  * Optional

      * [Bishop PRML book](https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf) (2006), pp. 67-70, 74-76, 93-94

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d844d564-d294-11ef-0454-416352d43524cell_id$d844d564-d294-11ef-0454-416352d43524codemd"""
When doing ML estimation, we must obey the constraint ``\sum_k \mu_k  = 1``, which can be accomplished by a [Lagrange multiplier](https://en.wikipedia.org/wiki/Lagrange_multiplier). The **constrained log-likelihood** with Lagrange multiplier is then

```math
\tilde{\mathrm{L}}(\mu) = \sum_k m_k \log \mu_k  + \lambda \cdot \big(1 - \sum_k \mu_k \big)
```

The method of Lagrange multipliers is a mathematical method for transforming a constrained optimization problem to an unconstrained optimization problem (see [Bishop App.E](https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf#page=727)). Unconstrained optimization problems can be solved by setting the derivative to zero. 

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d84369a4-d294-11ef-38f7-7f393869b705cell_id$d84369a4-d294-11ef-38f7-7f393869b705code+md"""
## Model specification

#### data-generating distribution

The outcomes ``x_n`` are encoded as
```math
x_{nk} = \begin{cases} 1 & \text{if the $n$-th throw landed on $k$-th face}\\
0 & \text{otherwise} \end{cases}
```

and the likelihood function for ``\mu`` is now

```math
p(D|\mu) = \prod_n \prod_k \mu_k^{x_{nk}} = \prod_k \mu_k^{\sum_n x_{nk}} = \prod_k \mu_k^{m_k} \tag{B-2.29}
```

where ``m_k= \sum_n x_{nk}`` is the total number of occurrences that the outcome landed on face ``k``. The vector ``m = (m_1,m_2, \ldots, m_K)^T`` is known as the **count vector**. Note that ``\sum_k m_k = N``.

This distribution depends on the observations **only** through the ''observed'' counts ``\{m_k\}``. For given counts ``\{m_k\}``, ``p(D|\mu)`` can be interpreted as a likelihood function for ``\mu``.

"""metadatashow_logsèdisabled®skip_as_script«code_folded$3c2ee96d-18a6-45d0-a2cf-f2ebbf5e22f0cell_id$3c2ee96d-18a6-45d0-a2cf-f2ebbf5e22f0codehide_solution(
md"""

```math
\begin{align*}
p(&x_{\bullet,k}=1|D) = \frac{m_k + \alpha_k }{ N+ \sum_k \alpha_k} \\
&= \frac{m_k}{N+\sum_k \alpha_k}  + \frac{\alpha_k}{N+\sum_k \alpha_k}\\
&= \frac{m_k}{N+\sum_k \alpha_k} \cdot \frac{N}{N} + \frac{\alpha_k}{N+\sum_k \alpha_k}\cdot \frac{\sum_k \alpha_k}{\sum_k\alpha_k} \\
&= \frac{N}{N+\sum_k \alpha_k} \cdot \frac{m_k}{N} + \frac{\sum_k \alpha_k}{N+\sum_k \alpha_k} \cdot \frac{\alpha_k}{\sum_k\alpha_k} \\
&= \frac{N}{N+\sum_k \alpha_k} \cdot \frac{m_k}{N} + \bigg( \frac{\sum_k \alpha_k}{N+\sum_k \alpha_k} + \underbrace{\frac{N}{N+\sum_k \alpha_k} - \frac{N}{N+\sum_k \alpha_k}}_{0}\bigg) \cdot \frac{\alpha_k}{\sum_k\alpha_k} \\
&= \frac{N}{N+\sum_k \alpha_k} \cdot \frac{m_k}{N} + \bigg( 1 - \frac{N}{N+\sum_k \alpha_k}\bigg) \cdot \frac{\alpha_k}{\sum_k\alpha_k} \\
&= \underbrace{\frac{\alpha_k}{\sum_k\alpha_k}}_{\text{prior prediction}} + \underbrace{\frac{N}{N+\sum_k \alpha_k} \cdot \underbrace{\left(\frac{m_k}{N} - \frac{\alpha_k}{\sum_k\alpha_k}\right)}_{\text{prediction error}}}_{\text{data-based correction}}
\end{align*}
```

(If you know how to do it shorter and more elegantly, please post in Piazza.)

This decomposition is the natural consequence of doing Bayesian estimation, which always involves a prior-based prediction term and a likelihood-based (or data-based) correction term that can be interpreted as a (precision-weighted) prediction error. 
		
		""")metadatashow_logsèdisabled®skip_as_script«code_folded$59fb1e66-cf05-4f2b-8027-7ff3b1a57c15cell_id$59fb1e66-cf05-4f2b-8027-7ff3b1a57c15codemd"""
# Code
"""metadatashow_logsèdisabled®skip_as_script«code_folded$d3a4a1dc-3fdf-479d-a51c-a1e23073c556cell_id$d3a4a1dc-3fdf-479d-a51c-a1e23073c556codeusing BmlipTeachingToolsmetadatashow_logsèdisabled®skip_as_script«code_folded$62b42d1d-be91-4740-bac6-b4527494959dcell_id$62b42d1d-be91-4740-bac6-b4527494959dcodemd"""

####  Maximum Likelihood estimation (**)

We consider IID data ``D = \{x_1,x_2,\ldots,x_N\}`` obtained from tossing a ``K``-sided die. We use a *binary selection variable*

```math
x_{nk} \equiv \begin{cases} 1 & \text{if $x_n$ lands on $k$-th face}\\
    0 & \text{otherwise}
\end{cases}
```

with probabilities ``p(x_{nk} = 1)=\mu_k``.         

- (a) Derive the log-likelihood ``\log p(D|\mu)``.        
- (b) Derive the maximum likelihood estimate for ``\mu``.

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d843c228-d294-11ef-0d34-3520dc97859ccell_id$d843c228-d294-11ef-0d34-3520dc97859ccode,md"""
This is actually a generalization of the conjugate relation that we found for the binary coin toss: 

```math
\begin{align*}
\underbrace{\text{beta}}_{\text{posterior}} &\propto \underbrace{\text{binomial}}_{\text{likelihood}} \cdot \underbrace{\text{beta}}_{\text{prior}}
\end{align*}
```

"""metadatashow_logsèdisabled®skip_as_script«code_folded$63cc56b7-588a-43c3-8327-ad6367608601cell_id$63cc56b7-588a-43c3-8327-ad6367608601codemd"""
# Summary
"""metadatashow_logsèdisabled®skip_as_script«code_folded$d843defc-d294-11ef-358b-f56f514dcf93cell_id$d843defc-d294-11ef-358b-f56f514dcf93codemd"""
## Categorical, Multinomial and Related Distributions

In the above derivation, we noticed that the data generating distribution for ``N`` die tosses with data outcomes ``D=\{x_1,\ldots,x_N\}`` only depends on the **counts** ``m_k``:

```math
p(D|\mu) = \prod_n \underbrace{\prod_k \mu_k^{x_{nk}}}_{\text{categorical dist.}} = \prod_k \mu_k^{\sum_n x_{nk}} = \prod_k \mu_k^{m_k} \tag{B-2.29}
```

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d842d368-d294-11ef-024d-45e58ca994e0cell_id$d842d368-d294-11ef-024d-45e58ca994e0codeXmd"""
Now consider a ``K``-sided coin (e.g., a six-faced *die* (pl.: dice)). How should we encode outcomes? Two natural options present themselves:

##### Option 1: label encoding 

```math
x \in \{1,2,\ldots,K\} \,.
```
  - E.g., for ``K=6``, if the die lands on the 3rd face, then ``x=3``.
  - This coding scheme is called **label** (or **index**) encoding. 

##### Option 2: one-hot encoding

```math
x = (x_1,\ldots,x_K)^T 
```
where ``x_k`` are **binary selection variables**, given by
```math
x_k = \begin{cases} 1 & \text{if die landed on $k$th face}\\
0 & \text{otherwise} \end{cases}
```
  - For instance, for ``K=6``, if the die lands on the ``3``-rd face, then ``x=(0,0,1,0,0,0)^T``.

  - This coding scheme is called a **1-of-K** or **one-hot** coding scheme.

It turns out that the one-hot coding scheme is mathematically more convenient!

"""metadatashow_logsèdisabled®skip_as_script«code_folded$4482e857-af6b-4459-a0a2-cd7ad57ed94fcell_id$4482e857-af6b-4459-a0a2-cd7ad57ed94fcodehide_proof(
md"""
```math
\begin{align*}
\hat{\mu}_k &= \arg\max_{\mu_k} p(D|\mu) \\
&= \arg\max_{\mu_k} p(D|\mu) \cdot \underbrace{\left.\mathrm{Dir}(\mu|\alpha)\right|_{\alpha=(1,1,\ldots,1)}}_{\text{uniform distr.}} \\
&= \arg\max_{\mu_k} \left.p(\mu|D,\alpha)\right|_{\alpha=(1,1,\ldots,1)}  \\
&= \arg\max_{\mu_k} \left.\mathrm{Dir}\left( \mu | m + \alpha \right)\right|_{\alpha=(1,1,\ldots,1)} \\
&= \frac{m_k}{\sum_k m_k} = \frac{m_k}{N}
\end{align*}
```

where we used the fact that the [maximum of the Dirichlet distribution](https://en.wikipedia.org/wiki/Dirichlet_distribution#Mode) ``\mathrm{Dir}(\{\alpha_1,\ldots,\alpha_K\})`` is obtained at  ``(\alpha_k-1)/(\sum_k\alpha_k - K)``.

		""")metadatashow_logsèdisabled®skip_as_script«code_folded$1c6d16be-e8e8-45f1-aa32-c3fb08af19cecell_id$1c6d16be-e8e8-45f1-aa32-c3fb08af19cecodePlutoUI.TableOfContents()metadatashow_logsèdisabled®skip_as_script«code_folded$d8449f1a-d294-11ef-3cfa-4fc33a5daa00cell_id$d8449f1a-d294-11ef-3cfa-4fc33a5daa00codeDmd"""
## Maximum Likelihood Estimation for the Multinomial

#### Maximum likelihood as a special case of Bayesian estimation

We can obtain the maximum likelihood estimate for ``\mu_k`` based on ``N`` throws of a ``K``-sided die within the Bayesian framework by letting the prior for ``\mu`` approach a uniform distribution. For a Dirichlet prior ``\mathrm{Dir}(\mu | \alpha)``, this corresponds to setting
``\alpha \rightarrow (1, 1, \dots, 1)``.


Prove for yourself that 

```math
\begin{align*}
\hat{\mu}_k &= \arg\max_{\mu_k} p(D|\mu) = \frac{m_k}{N}\,.
\end{align*}
```

"""metadatashow_logsèdisabled®skip_as_script«code_folded$d8422bf2-d294-11ef-0144-098f414c6454cell_id$d8422bf2-d294-11ef-0144-098f414c6454code7title("Discrete Data and the Multinomial Distribution")metadatashow_logsèdisabled®skip_as_script«code_foldedënotebook_id$77582614-4561-11f1-923d-77a47dbbfbaain_temp_dir¨metadatafrontmatterauthornameBMLIPurlhttps://github.com/bmlipdescriptionSBayesian and maximum likelihood density estimation for discretely valued data sets.